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INTERACTION AND INSIGHT IN GROUP PSYCHOTHERAPY! 


W. H. COONS 
Ontario Hospital and McMaster University, Hamilton 


ALL MaAjOR scHOooLs of psychotherapy have encountered difficulty in 
explaining how and why therapeutic changes occur during psycho- 
therapy. Since Freud’s development of psychoanalysis the primary stress 
in all systematic psychotherapies has been on insight as the core of adjust- 
ment. However, no unequivocal relationship between degree of insight 
and level of adjustment has yet been demonstrated. If insight is considered 
to be a cognitive act by which the significance of some pattern of rela- 
tions is understood (12), clinical experience does not wholly support its 
use as an explanatory concept. Improved adjustment occurs in persons 
who have not shown evidence of increased insight; other persons who are 
thought to have gained insight remain seriously maladjusted. This sug- 
gests that there is a more basic explanation for behavioural changes 
which occur during psychotherapy. 

The literature reveals two lines of evidence which converge to support 
the assumption that insight is not the crucial condition for change in 
behaviour. All major psychotherapeutic systems recognize that insight 
alone is ineffective and make careful provision for interpersonal inter- 
action (1, 2, 5, 11, 13, 15). At the same time, current research on per- 
sonality development suggests that understanding is not enough to 
assure adaptive learning, and that adjustment to reality depends on 
opportunities for the repeated trial and check of an individual’s expecta- 
tions (7, 10, 14). Both these trends suggest that opportunity for inter- 
personal interaction in a consistently warm and accepting social environ- 
ment is central to psychotherapy. 

From this theoretical orientation it was hypothesized that interaction, 
rather than insight, is responsible for therapeutic improvement. There- 
fore, it was predicted that a technique of psychotherapy which stressed 
interpersonal interaction in the absence of insightful content would be 
superior, in effecting improved adjustment, to a technique which stressed 
insight with minimal interaction. 


1Adapted from part of a thesis submitted in 1955 in partial fulfilment of the require- 
ments for the degree of Doctor of Philosophy at the University of Toronto. The author 
is indebted particularly to Dr. J. N. Senn and Dr. C. R. Myers for their assistance in 
the conception and development of this study. 
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The present study was designed to test this hypothesis experimentally 
by comparing the effects on adjustment of two techniques of group 
psychotherapy. One of these (Interaction Therapy) fostered group inter- 
action in the absence of the usual concern for imparting insight; the 
other (Insight Therapy) strove to impart insight while holding group 
interaction to a minimum. A control group which received no group 
psychotherapy was included in the study as an index of the absolute 
efficacy of the two therapeutic techniques. 


METHOD 
The Measures of Adjustment 

For reasons outlined in the original report of the study (4), the Rorschach 
Technique of Personality Diagnosis (8) and the Wechsler-Bellevue Adult Intelligence 
Scales (16, 17) were chosen as the best available indices of adjustment. 

The Rorschach. The Rorschach was administered to each of 64 research subjects 
before and after the period of therapy. Pre- and post-therapy protocols for each 
subject were analysed “blind” by an experienced Rorschach examiner.? The examiner 
knew only that each pair of protocols was from the same subject, and that one of the 
protocols was the pre-therapy, the other the post-therapy record. He did not know 
which was which, nor did he know in which type of group the subject had been. 
This procedure eliminated any personal biases due to acquaintance with the subject 
or active espousal of either therapeutic technique. 

The examiner was instructed to select in each pair the protocol which represented 
the better level of adjustment. When the protocol which he judged to be “better” 
was the post-therapy record, that subject was considered to have shown improved 
adjustment during the therapy period. The examiner was unable to distinguish 
differences between the protocols of eight patients. These patients were considered 
to have shown no improvement. 

As a reliability check, another experienced Rorschach examiner? independently 
examined 20 of the 64 pairs of Rorschach protocols. His selections corresponded with 
those of the first examiner in 19 out of the 20 pairs. 

The Wechsler-Bellevue Adult Intelligence Scales. Pre- and post-therapy intellectual 
efficiency was measured with Forms I and II of the Wechsler-Bellevue (16, 17). 
Since available evidence shows no significant difference between the Verbal Scale, 
Performance Scale, and Full Scale Intelligence Quotients of these two scales (6, 17), 
they were treated as equivalent, the two forms being administered alternately to 
each patient, i.e., when one form of the scale was used for the pre-therapy assessment, 
the other was used for the post-therapy assessment. 


Research Design 


The research involved the operation of two experimental units (one male, one 
female), each of which was composed of three different types of group: 

1, Interaction groups. Members of this type of group were subject to the usual 
hospital routines with the addition for three hours each week of a type of group 


2K. G. Ferguson, Westminster Hospital, London, Ontario. 
3F, W. Burd, Westminster Hospital, London, Ontario. 
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experience which will be referred to as interaction group psychotherapy. The 
technique used was designed to create a warm, acceptant, and permissive atmosphere 
in which patient-to-patient interaction was encouraged. Interaction was considered 
to be any type of verbal communication on any subject. There was little or no reference 
to the content usually associated with psychotherapy, and discussion ranged from the 
price of fur coats to the progress of industrial expansion in Newfoundland. Thus the 
interaction groups were characterized by emphasis on maximum interaction in the 
absence of insightful material. 


2. Insight groups. Members of this type of group were subject to the usual hospital 
routines with the addition three times each week of a type of group experience which 
will be referred to as insight group psychotherapy. The technique involved directed 
discussion of the aetiology, manifestations, and control of psychological disturbances. 
Personal involvement on the part of each group member was encouraged. Each 
member was directively encouraged to examine his personal difficulties, their origins, 
and their solution. Emotional catharsis was common. The therapeutic climate might 
best be described as benignly authoritarian. Interaction was restricted to that between 
patient and therapist. Thus the insight groups were characterized by maximum 
emphasis on insight with minimum interaction. 


8. Control groups. Members of the control groups received no planned group 
psychotherapy, but were subject to the usual hospital routines. They did not realize 
that they were considered as a group. 

To ensure that the two types of therapy groups actually showed the difference in 
amount of interaction required by the research design, all available transcripts of the 
recordings of the therapy sessions were analysed after the method of Bovard (3) to 
derive “interaction ratios.” These ratios confirm that the interaction group sessions 
were characterized by a high proportion of patient-to-patient interaction, and that 
this was largely absent in the insight group sessions. (The difference in ratios is highly 
significant: C.R. 12.8, P .001.) 


Subjects 

A total of 66 patients at the Ontario Hospital, Hamilton, were used as research 
subjects. These were all the patients for whom complete, or nearly complete pre- and 
post-therapy data were available. 


Selection. Each subject was selected on the basis of suitability for group psycho- 
therapy. The selection procedures took no account of psychiatric nosology. The 
majority of the patients chosen turned out to be classified as Schizophrenic. 

To reduce the possibility that any one group might be formed of patients with an 
especially favourable prognosis, patients selected for therapy were randomly assigned 
to each group of each unit. Comparison of the groups on such factors as economic 
status, occupational classification, nosological grouping, educational level, 1.Q., age, 
duration of illness and hospitalization, number of E.C.T., etc., indicated that the 
randomization procedures were successful in producing comparable groups (4). 


Replacement. Initially, 21 patients were selected for each unit and randomly 
assigned to the three groups comprising it. However, group members were subject 
to the ordinary hospital routines, including those regarding discharge. Thus, when a 
group member was considered suitable for release by the physician in charge, or 
when relatives insisted on taking him home, he was discharged from the hospital. 

Replacements were taken in sequence as required, from a reserve of patients 
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considered suitable for group psychotherapy. When a vacancy occurred in any 
group of a unit it was filled with the patient whose name headed the reserve list. 


Procedure 


The experimental procedure lasted for fifteen months, during which the composition 
of the six groups altered as patients were released and replaced by others (see above). 
At any given time there were seven members in each group, 21 in each experimental 
unit (male and female). Each psychotherapy group had three group sessions, each 
of one-hour duration, each week. Two therapists were involved, one working with 
the male unit, the other with the female unit. All group sessions were recorded 
electrically and transcripts of samples made at regular intervals. 

Duration of therapy. Owing to the fluid membership of the groups, duration of 
therapy was not the same for all subjects. However, no member was used as a research 
subject unless he had a minimum of eight hours of psychotherapy. The maximum 
number of hours for any subject was ninety. In this respect, members of the control 
groups were dealt with as though they were members of a group receiving therapy. 
That is, at any time they were considered to have had the number of sessions which 
they would have had if they had actually been assigned to a psychotherapy group when 
first included as a “control” subject. The average number of group sessions of patients 
in the Interaction Group was 27 (Range: 8-60), the Insight Group 32 (Range: 9-87), 
and the Control Group 47 (Range: 12-90). 


Testing. The Rorschach and Wechsler tests were administered individually before 


and after therapy by members of the psychological staff at the Ontario Hospital, 
Hamilton. 


Statistical Analysis 

The study was designed to permit comparison of the effects of three different types 
of experience: interaction group psychotherapy, insight group psychotherapy, and no 
group psychotherapy. Differences were held to be significant when they made possible 
the rejection of the null hypothesis at the 5 per cent level of confidence. Techniques, 
and methods of reporting results, follow McNemar (9). The devices used are: 
standard error of the difference between percentages, and analysis of variance for 


small samples. No t-scores are considered significant unless they occur in the context 
of significant over-all F-ratios. 


Rorschach RESULTS 


As is shown in Table I, 16 out of 23 subjects in the Interaction Group, 
10 out of 23 subjects in the Insight Group, and 7 out of 19 subjects in 


TABLE | 
NUMBERS OF SUBJECTS IN Eacn Group RATED AS 
IMPROVED AND UNIMPROVED AFTER THERAPY 


Group Improved Unimproved Total 
Interaction 16 6 22 
Insight 10 13 23 


Control 7 12 19 
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the Control Group were rated as showing improved adjustment follow- 
ing the therapy period. 

Table II shows that the percentage of improved subjects in the 
Interaction Group was significantly greater than the percentages of 
improved subjects in either the Insight Group (P=.04) or the Control 
Group (P=.02). The difference between the Insight and Control Groups 
was not significant. 


TABLE II 


DIFFERENCES IN PERCENTAGES OF SUBJECTS SHOWING 
IMPROVEMENT IN INTERACTION, INSIGHT, AND CONTROL GROUPS 


Difference 
Groups compared in per cent Dp CR P 
Interaction—Insight 31 14.7 2.1 0.04 
Interaction—Control 37 15.5 2.4 0.02 
0.8 N.S. 


Insight—Control 12 15.2 


Wechsler-Bellevue 


Table III indicates that significant intergroup differences in the 
amount of change in Wechsler Full Scale IQ occurred during the therapy 
period (F=5.8, P=.01). The Interaction Group showed improvement 


TABLE III 


DIFFERENCES BETWEEN PRE- AND Post-THERAPY WECHSLER-BELLEVUE IQ's 
OF (a) INTERACTION, (b) INSIGHT, AND (c) CONTROL Groups 


t-scores of differences 


Mean change in I.Q. between means os 
use, (a) (b) _(c)__ F-ratiot (a) & (b) (a) & (c) (b) &(c) 
Verbal 1Q 4.4 -0.5 1.7 2.0* 2.0 1.0 0.9 
Performance IQ 7.6 2.1 1.0 2.2 1.7 1.9 0.3 
Full IQ 9.0 0 1.0 3.3°° 3.147 2.6** 0.3 


tWith 2 and 62 degrees of freedom. 
*Significant at the 5% level. 
**Significant at the 1% level. 


which was not approached by either the insight Group (t=3.14, P= 
003) or the Control Group (t=2.63, P=.01). The difference in mean 
changes in IQ between the Insight and Control Groups was not signifi- 
cant. 

No significant intergroup differences in change on the Verbal and 
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Performance Scales of the test are evident. This suggests that the 
change in intellectual efficiency was a function of a general improve- 
ment on all sub-tests. For confirmation of this suggestion, the significance 
of the intergroup differences of mean changes on sub-test weighted 
scores was tested. The results show only two exceptions, the Compre- 
hension and Digit Symbol sub-tests. The Interaction Group showed 
improvement on Comprehension which was significantly greater than 
that shown by the Insight Group (t=2.57, P=.01). The change in 
performance of the Insight and Control Groups on this sub-test was 
not significant. On Digit Symbol, also, the Interaction Group showed 
greater improvement than did the Insight Group (t=2.28, P=.03) or 
the Control Group (t=2.21, P=.03). 

In summary, the Interaction Group showed a generally greater im- 
provement in intellectual efficiency, which was most marked in the areas 
of general comprehension and new learning ability. 


Discussion 

We have found two indications that the interaction technique was 
more effective in producing improved adjustment than the insight tech- 
nique. This evidence supports the hypothesis that it is interaction rather 
than insight that is the basis of psychotherapeutic change; it casts 
serious doubt on the generally accepted belief that the chief purpose of 
psychotherapy is to facilitate insight. It suggests that the interpersonal 
interaction which characterizes both individual and group psychotherapy 
may, in itself, be the crucial factor in the production of therapeutic 
change. If this is true, explicit recognition of interaction as the prime 
therapeutic agent would require reorientation of our therapeutic efforts. 

Psychotherapists from Freud onward have found it necessary to deviate 
in practice from those parts of their theoretical systems which stressed 
insight as the medium of therapeutic change. While insight has been 
retained as the explanatory concept, actual practice has moved pro- 
gressively towards techniques which maximize patient-to-therapist or 
patient-to-patient interaction, in special types of acceptant atmospheres. 
Modern group therapy has been the culmination of this trend. Rational 
man is loath to surrender voluntarily any aspect of his behaviour to a 
non-cognitive process, since this implies that he is not complete master of 
his fate. In theory, psychotherapists have (with some difficulty) resisted 
surrender; in practice, the surrender is becoming ever more complete. 
Practical considerations have thus resulted in the modification of thera- 
peutic techniques along the lines suggested by the present study. Ex- 
plicit recognition of interaction as basic to therapeutic changes could 
provide the foundation for further and much needed technical advances. 
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SUMMARY 


Sixty-six hospital patients were divided at random into three groups: 


(a) A group which experienced a technique of group psychotherapy 
which stressed interpersonal interaction in a warm, permissive, thera- 
peutic climate and made no reference to personal difficulties; 


(b) A group which experienced a technique of group psychotherapy 
which stressed cognitive understanding of personal difficulties (insight) 
in a benignly authoritarian therapeutic climate; 


(c) A control group which experienced no group psychotherapy. 


The first group showed significantly greater improvement in adjust- 
ment than did either of the other two groups. 


From these results the following conclusions were drawn: 

(1) In group psychotherapy, greater improvement results from a 
technique which stresses interaction than from a technique which 
stresses insight. 

(2) Therapeutic change can and does occur as a result of controlled 
interaction in the absence of traditional insight methods of psycho- 
therapy. 

(3) Since there were no apparent differences in results between an 
insight technique and “no treatment,” interaction rather than insight 
seems to be the essential condition for therapeutic change. 
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AN APPROACH TO THE MEASUREMENT OF 
CREATIVE THINKING 


B. M. SPRINGBETT, J. G. DARK,! anv J. CLAKE? 
University of Manitoba 


Ir has long been recognized that thinking and problem-solving involve 
unconscious® processes (2). On the basis of arguments offered later, we 
should like to suggest that “creative” thinking differs from conventional 
problem-solving only because it involves a greater sensitivity to such 
unconscious processes. Hence a test which will measure creative thinking 
must show some relation to tests of reasoning, intelligence, and so forth, 
and yet, at the same time, it must measure a degree of sensitivity to 
unconscious processes greater than that required by such tests of 
reasoning. 

One solution to this problem of measuring the contribution of uncon- 
scious processes is offered here in the form of what we call the Lines Test. 
Evidence of its empirical validity is offered in terms of correlations with 
four criterion tests of reasoning and intelligence. In addition, a claim is 
made for “face” validity, but with no implication that this removes the 
need for an empirical demonstration that the test measures creative 
ability. 

The discussion of creative thinking and the analysis of conventional 
tests of reasoning which follow are intended to provide a rationale for the 
Lines Test, and to define the requirements it must meet. The point of 
view expressed here could be accommodated by a number of formal 
theories of thinking, such as those of Hebb (1), Osgood (5), and 
Malzburg (3). 


THE NATURE OF CREATIVE THINKING 
The unconscious aspects of creative thinking are suggested by the fact 
that the end-product, the solution, is open to introspection but the pro- 
cesses leading up to it are not. A logical validation of the solution can 
frequently be worked out, but it is doubtful whether this logical chain 


1Now at MacDonald R.C.A.F. Station, Manitoba. 

2Now at the University of Toronto. The data in Experiment II are from a study 
completed at Manitoba. 

8This term is used with reluctance. The “content” to which it refers is akin 
to the Freudian pre-conscious. See below. 
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Canap. J. Psycuor., 1957, 11 (1). 
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can be identified with the processes responsible for the solution. This is 
indicated by the fact that the logical validation is often difficult to con- 
struct, and that once the logical relations are clear the solution frequently 
carries new meanings and implications. 

Another characteristic of creative thought is that the solution is both 
new and valid. It creates new meanings in the phenomenal world and so 
re-structures it. 

If creative thought is to be identified both with the newness and the 
validity of its product, a dual set of determinants is implied. The loose 
work of dream and phantasy produces much that is new, but the products 
possess little validity. The application of a set of conventional rules to a 
set of data produces valid results, but little that is new in the creative 
sense. Creativity requires the flexibility of phantasy to seek out what is 
new, but the result must be tied to the conventional to secure validity. 

For example, in the application of mathematics to chemistry there are 
a number of possible approaches, all equally promising from a mathe- 
matical viewpoint; but only some of these prove to be valid for chemistry. 
The conditions of freedom lie within the complex, conventional mathe- 
matical field. The creative act employs this freedom to seek out the 
mathematically conventional line which has valid linkages with what is 
conventional within the framework of chemistry. The greater the im- 
probability of such linkages the more creative do we consider the act. 
The improbability appears as a function of dissimilarity between the 
newly discovered and nreviously known linkages. 

Creative thinking che process by which such improbable linkages are 
discovered; and, if we accept Hebb’s‘ view of consciousness, it follows 
that creative thinking is a function of the relative strengths of conscious 
and unconscious processes. 

The assumption is that consciousness is no more and no less than a 
fully activated, well-organized neural pattern. When we are confronted 
with a problem, and with the relevant data, a wide range of such 
organizations is available, but some will have higher probabilities of 
being activated than others. Those with the highest probability are the 
ones most highly organized with respect to the particular problem and 
the data; in short, the best learned organizations. These will also be, for 
a given individual, the most conventional and commonplace modes of 
organizing the data. Thus the initial conscious reaction has a high prob- 
ability of being conventional, valid, but not new. 


4The development here may not fit Hebb’s views in all respects. Osgood (5, p. 40) 
suggests a similar point of view. A philosophical system which accommodates this 
position in considerable detail is that presented in S. Alexander's Space, Time, and 
Deity (London: Macmillan, 1920). 
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Other neural organizations, with lower probabilities of being engaged, 
may be thought of as being partially activated. These may modify the 
patterning of the fully activated (conscious) organizations and produce 
deviations from the usual conscious content. Such interactions may 
enhance or detract from the possibility of a solution at the conscious 
level. Solutions found under these conditions would be novel ones, though 
still closely tied to the conventional. 


Since, by our definition, the creative act requires the discovery of im- 
probable relations, it follows that a creative solution is to be found 
through the engagement of the partially activated (unconscious) or- 
ganizations. 

When the well-organized patterns fail to produce a solution, the 
probabilities of the partially activated organizations becoming fully 
activated, in relation to the data, will increase. This will be due to in- 
hibitory factors operating in the well-organized patterns, together with 
recruitment of strength in the partially activated ones. The latter may 
arise from simple summation effects of partial activation, but other factors 
are likely to play a part. The probabilities of the partially active becoming 
conscious might be expected to vary with changes in the sequence in 
which the data are viewed, with reformulations of the problem, with 
leaving the problem and returning to it, and so on. 

If we accept the above rough description of problem-solving, the 
creative thinker will be one in whom there is a sensitive interplay among 
the possibly available organizations. The conventional problem-solver 
will be dominated by one set of such organizations, namely, those learned 
specifically in relation to the type of problem confronting him. 


CONVENTIONAL TESTS 


Conventional tests of reasoning, intelligence, and the like, all demand 
the creative type of interaction in some degree. However, they afford no 
means of assessing the contribution of unconscious factors to the solution, 
nor do they offer a situation which demands a high degree of sensitivity 
to such interaction, that is, they fai] to provide optimum conditions for 
the creative act to occur. 

The most formidable barrier to creative thought in the majority of con- 
ventional tests is that the principle under which the problems are to be 
solved is given, in the instructions, or through practice questions. Only 
with this sort of information can the subject meet the conditions of the 
test. The creative thinker, in contrast, must discover the principle, the 
nature of the organization, under which the solution is possible. 

A second defect lies in the sequence in which the elements of a specific 
problem are presented. Verbal problems are put into logical form. 
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Matrices of various sorts are found in orderly progression so that, 
although there is a hidden principle, its discovery is, in a sense, imposed 
on the subject. Or, if the elements have a disorderly array, as in the 
jumbled sentence, rules of grammar and habits of speech automatically 
provide the basis for an orderly and conventional arrangement. On the 
other hand, the creative act must discover, in a disorderly array, the 
principle which will make it meaningful and orderly. 

Finally, in the conventional tests all the data of a specific problem are 
perceptually present. The words, figures, or other information are all 
before the subject, so that they can be immediately checked if memory 
is doubted or difficulty arises. In the creative situation the various 
elements are usually simultaneously present only in ideational form. 

This line of reasoning points to the following as the requirements of a 
test of creative thinking: (a) The elements of the problem must permit 
organization (possess relations), but the directions given in the test must 
yield no hint that such organization exists; (b) the elements of the prob- 
lem must be capable of being presented in such a way that the eduction 
of relations is difficult; (c) the various elements must be presented 
separately, so that they can be simultaneously present to the subject 
only ideationally; (d) the test must yield some measure of the interaction 
of conscious and unconscious processes. 


Tue Lines TEst 


After a number of attempts to meet the foregoing requirements in a 
test of thinking, a solution of the problem finally presented itself in the 
torm of a test of immediate memory. This is described below, and 
followed by a discussion of how it meets the above requirements. 

Figure 1(a) shows three figures, each of which is drawn with nine 
straight lines. Since names can be attached to these figures (box, prism, 
chair), they are called meaningful figures (M). Figure 1(b) shows three 
figures, each of nine lines, which have symmetry but are difficult to name. 
These are called gestalt figures (G). Figure 1(c) shows three nonsense 
figures (N) without symmetry or meaning, each consisting of nine lines. 

In the test situation the subject is shown the nine lines of one figure, for 
example, the box. These are presented one at a time, each line being 
drawn on a 4 X 4 grid. When the nine lines have been presented in suc- 
cession he is asked to draw all of the lines he can remember on a single 
4 x 4 grid. If he is entirely successful he will, of course, reproduce the 
box on his answer sheet. Each of the nine figures is similarly presented, 
and he is asked to reproduce the component lines of each from memory. 

With respect to the meaningful figures it is clear that (a) the nine 
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a. MEANINGFUL FIGURES 


MI M2 M 3 


b. GESTALT FIGURES 


GI G2 G3 


C+ NONSENSE FIGURES 


NI N2 N3 


Ficure 1. Figures used in Experiments I and II. 


lines have meaningful organization, but no hint has been given to the 
subject that such an organization exists; (b) the nine lines can be 
presented in a sequence which will make the eduction of relations, 
relevant to the over-all organization, difficult: (c) since the lines are 
presented singly, they can be simultaneously present to the subject only 
ideationally; (d) if the subject fails to reproduce all nine lines we may 
infer that he is unconscious of the over-all organization. 
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If the subject's performance on the meaningful figures, though not 
perfect, is superior to that on the nonsense figures, this may be attributed 
to the organization in the former of which the subject is not aware. If 
performance on the meaningful figures is superior to that on the gestalt 
figures, this will guard against the criticism that the meaningful figures 
were better remembered than those of the nonsense figures because the 
subject became aware of sub-organizations such as squares, triangles, etc. 
On these grounds the test appears capable of measuring the interaction of 
conscious and unconscious processes, and this, we suggest, is the essential 
problem in measuring creative thinking. 

The conscious organization by which the subject will achieve rote 
memory of the lines is related to the spatial framework in which each line 
is presented. The role of the unconscious organization is viewed as 
follows: it is assumed that the nine lines of a meaningful figure, separately 
presented, will partially activate the neural organization normally in- 
volved in perceiving such a figure; when the subject attempts to recall 
these lines they will be better remembered by virtue of their membership 
in this organization, provided the subject is not dominated by his 
conscious processes but is sensitive to those which are unconscious. 

Since, on the whole, such inieraction will aid recall, the demand will 
be, for group results, that the meaningful figures be better remembered 
than the gestalt or nonsense, and the gestalt better than the nonsense. 

If sensitivity to such interaction is required to a high degree in creative 
thinking and to a lesser degree in reasoning and problem-solving, scores 
on the Lines Test should show the following relationships with scores on 
tests of reasoning and intelligence: scores on the meaningful and gestalt 
figures should show significant correlations with such tests, because they 
possess conventional organizations which provide the basis of the 
postulated interaction. The scores on the nonsense figures, being wholly 
dependent on conscious processes (and possessing the character of rote 
memory ), should not show significant correlations with the scores of such 
tests. 

The nature or patterning of the predicted correlations may be under- 
stood better by considering the structure of the variance presumed to be 
present in the scores of each class of figures. The chief sources of 
variance in the meaningful figures will be (a) rote memory, (b) an 
unconscious spatial component, (c) an unconscious verbal component. 
For the gestalt figures (a) and (b) will apply; for the nonsense figures 
(a) only. 

These sources of variance may be separated by using difference scores; 
for example, meaningful minus nonsense should leave the variance of the 
unconscious spatial and verbal components, and so forth. 
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EXPERIMENTS 
GENERAL PROCEDURE 


In the experiments reported below, the lines in each of the nine figures shown in 
Figure 1 were presented one at a time, each line being drawn on a 4 x 4 grid. Each 
line was exposed for two seconds with no time interval between exposures. After 
viewing the nine separately presented lines of a figure, the subject was asked to 

uce all he could remember on a single 4 x 4 grid. 

In order to check on procedures and the general promise of the test, a pilot study 
involving nine subjects was carried out. The results were encouraging, and on the 
basis of this experience the following experiments were designed. 


EXPERIMENT [| 


The aims of this experiment were (a) to provide a check on the results 
of the pilot study; (b) to demonstrate that the subjects were not 
conscious of the over-all organization of the test figures. 


Method and Procedure 

In order to support the claim that the differences in mean scores obtained on M, 
G, and N figures are due to the operation of unconscious factors it is necessary to 
control the variable “difficulty of line sequence.” It is possible that, in selecting a 
given line sequence for presentation, a more difficult sequence might be selected for 
the N figures than for the M figures. If this were the case, the observed differences 
in means would reflect difficulty of sequential order. 

Systematic exploration of all possible orders of presentation would be so formidable 
a task that the following assumptions were made as to what would constitute easy and 
difficult sequences. It was assumed that an “easy” sequence would be one in which 
each line presented would be in contact with the preceding one at some point.5 A 
“difficult” sequence would be one in which successively presented lines would be as 
widely separated as possible, while a sequence of “intermediate” difficulty would be 
obtained by a mixture of the “easy” and “difficult.” 

Results in the pilot study showed the necessity of counterbalancing for series effects. 
In order that these effects would fall equally on each of the three orders of line 
sequences and each of the three classes of figures (M, G, and N), nine groups of 
subjects were required. Four subjects (university students) were assigned to each 
group, making 36 in all. Each group was tested on all nine figures (presented in a 
different order to each group), but a different order of line sequence was used for 
each group. Thus the mean score for each class of figures, based on 36 subjects, is 
derived from data counterbalanced for series effects, with respect to both line 
sequence and class of figure. 


RESULTs 


(a) The mean scores for M, G, and N figures, respectively, are: 17.75, 
15.48, 12.57. The pattern of results is consistent with the hypothesis. M 


5This is not strictly achieved in Fig. lc (1). One break in the consecutive 
sequence is unavoidable. 
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scores are significantly higher than G scores (t = 2.225, p<.05) and N 
scores (t = 6.17, p<.001). G scores are significantly higher than N 
scores (t = 3.009, p<.01). 

(b) Did the subjects become aware of the organization of M and G 
figures? Each subject performed on three figures of each class, making a 
total of 108 attempted reproductions of each class. Out of these totals of 
108, there were only 8 perfect reproductions of the M figures, 9 of the G 
figures, and none of the N figures. Even assuming that these perfect 
reproductions resulted from being aware of the “meaning” of the lines 
before they were drawn, they are not numerous enough to account for 
the results shown in (a) above. 





EXPERIMENT II 


Group results in Experiment I supported the hypothesis that the opera- 
tion of unconscious factors would produce differences in M, G, and N 
scores. The question remained whether these M, G, and N scores were 
differentially related with performance in reasoning and problem-solving. 
Demonstration of this relation, it was argued in the Introduction, would 
constitute one stage in the validation of the Lines Test as a measure of 
creative thinking. An answer to the question was sought in the form of 
product-moment correlations between scores on the Lines Test and on 
four criterion tests. 


Methods and Procedures 


For this experiment a specific line sequence was selected for each of the nine test 
figures. This selection was based on the results of Experiment I and was such that no 
M figure had a mean score smaller than any G or N figure, and no G figure had a 
mean score smaller than any N figure.® 

Four criterion tests were chosen: (1) the Mooney Closure Test (4), because it 
seemed to involve, at the perceptual level, processes similar to those assumed to be 
operating in the Lines Test; (2) the D.A.T. Spatial Relations Test, chosen as a check 
on the assumed unconscious spatial component in the Lines Test; (3) the D.A.T. 
Abstract Reasoning Test, as an example of a test based on matrices like those discussed 
in the Introduction; (4) the Otis Intelligence Test (higher form) as a test of general 
verbal intelligence. 

The subjects were 58 male office workers. 
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®This does not bias the outcome of the experiment. The absolute levels of scores 
are of no consequence; our only concern is the covariance between scores on the Lines ~ 
Test and the criterion tests. (Hindsight, however, acknowledges little merit in the 
logic underlying this selection, and better results could probably be secured by 
using the most difficult sequence in each class of figures. ) 
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RESULTS 


Results are shown in Table I. To aid in their interpretation, the pre- 
sumed sources of variance for each of the scores derived from the Lines 
Test may be repeated—M: rote memory, unconscious spatial and verbal 


TABLE | 


PRODUCT-MOMENT CORRELATIONS OBTAINED IN EXPERIMENT II BETWEEN VARIOUS 
SCORES ON THE LINES TEST AND THE 4 CRITERION TESTS 


Test 1 2 3 8 9 10 1l 
1.M rr ag .36** .43** Ph” ing .29* 
2.G a””)66l ee .33* .58** 2 
3. N 17 .22 .22 A 
4. M—N .20 28* .42°* 2 
5. G-N 19 16 .45°* .06 
6. M—G .06 .038 —.06 oka 
7. M—G plus N 17 .29* 21 .25 
8. Mooney 33* .03 .09 
9. Space relations .43°* = .25 
10. Abstract reasoning ~5s** 
11. Otis 1.Q. 


*Significant at 5% level. 
**Significant at 1% level. 


components; G: rote memory, unconscious spatial component; N: rote 
memory; M — N: unconscious spatial and verbal component; G — N: 
unconscious spatial component; M — G: unconscious verbal component; 
M —G plus N: rote memory, unconscious verbal component. 

Table I shows that the correlations between criterion tests and M, G, 
and N scores are consistently in line with the hypothesis. M scores are 
significantly correlated with all criterion test scores; G scores are signi- 
ficantly correlated with all but one (the correlation between G scores 
and Otis scores falls slightly below the 5 per cent level of significance). 
N scores are not significantly related to any of the criterion test scores. 

The various difference scores produce few significant correlations with 
criterion tests. However, the over-all pattern of results suggests that the 
postulated sources of variance are operating. Effects of the unconscious 
verbal component do not show up strongly, but wherever this factor is 
presumed to be present (M, M — N, M —G) there are on the average 
higher correlations with the verbal criterion test (Otis) than when it is 
absent (G, N, G—N). The scores where the unconscious spatial com- 
ponent is presumed to be present (M, G, M— N, G — N) show higher 
correlations with the D.A.T. Spatial Relations Test than do those in which 
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this factor is absent (N, M — G). It is of interest to note also that the 7 


highest correlations are obtained with the D.A.T. Abstract Reasoning 
Test. Matrices tests are the closest in principle to the Lines Test since 
they contain a “hidden principle,” but one which, because of the orderly 
sequence of the matrix, is relatively easy to discover. 

The basic results as they concern M, G, and N scores, together with the 


more detailed evidence related to the postulated sources of variance, fit § 


with some degree of comfort into the general hypothetical framework of 
this investigation. 


Discussion 


At a factual level, the results show that, in a task of rote memory, the | 
presence of a meaningful organization of which the subject is unaware | 
produces results significantly related to his performance on tasks in- | 


volving reasoning and problem-solving. When the organization is absent 
these relations do not appear. It has already been argued that the 
demonstration of these relationships is one stage in the validation of the 


Lines Test. Negative results would have destroyed any claim that it © 


could measure creative thinking. 

Although final validation must rest on empirical evidence involving ad- 
mittedly “creative” subjects, there are grounds for expecting such a test 
to be successful. In the criterion tests, especially in the matrices test, 
there is obviously a hidden principle; as already stated, however, this 
principle is relatively easy to discover. It is equally evident that the 
hidden principle in the Lines Test is very difficult to discover under the 
test conditions. We would argue that it is precisely the difficulty of dis- 
covering the hidden principle which distinguishes creative thinking from 
reasoning and problem-solving. An analogy may clarify this point. 

Darwin's discovery of the principle of evolution would be generally 
accepted as a creative achievement. Comparing the data as he had to 
deal with them and as they would be presented in a matrices test, we 
have an apt parallel with the comparison between the Lines Test and the 
matrices test. 

What Darwin had to discover was that a temporal sequence which 
applied to rock formations also applied to fossils.’ This was difficult 
because the rock formations were nowhere in complete order and were 
often jumbled together, fossils were of relatively rare occurrence, and 
they were discovered at different times and places. A further factor in 
making his discovery “improbable” lay in the general acceptance of the 
story of Genesis. 


TThis is not intended to slight the tremendously difficult task of working out all 
the logical implications. 
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If geological formations were regularly found neatly stacked in correct 
temporal order, with a fossil visibly implanted in each layer, one cannot 
believe that the common temporal sequence would long have evaded 
discovery, or that its discovery would have been regarded (even in spite 
of Genesis) as a particularly creative act. In other words, it seems to be 
the relative difficulty of discovering the hidden principle which under- 
lies the accepted distinction between creative thinking and simple reason- 
ing. 

The material in the Lines Test is presented to the subject in the 
jumbled order and temporal separation which characterized Darwin's 
data, and the subject’s acceptance of the task as one of “rote memory” 
may perhaps be equated with the effect of the story of Genesis. It is 
on the basis of this apparent “face validity” that we expect the test to 
survive empirical validation. 

That no such validation can yet be presented is owing to the necessity 
of adapting the test to various types of content. Since creative thinking 
typically occurs in those fields in which the thinker is well learned, 
different forms of the test will be required to investigate creativity in 
different fields. For example, to investigate literary creativity, the test 
must be of such a nature as to engage (produce interaction between) 
verbal organizations. In its present form, while the presumed spatial 
component gave convincing results, the presumed verbal component 
failed to produce significant correlations with the verbal criterion test. 
The verbal aspects of the meaningful figures were perhaps not suffi- 
ciently conspicuous. 

Another form of the test, possessing unmistakable verbal organization, 
has been prepared, and is found to correlate substantially with verbal 
intelligence. Direct validation is being planned, and a third, “art,” form 
is also being constructed. 

Regardless of the outcome of the validation studies, the present results 
seem worth communicating, since they suggest ways in which some of the 
problems involved in thinking and problem-solving may be reformulated 
and investigated. 


SUMMARY 


The assumptions are made that: (a) reasoning, problem-solving, and 
creative thinking all involve finding a principle, or mode of organization, 
which holds the key to the solution; (b) creative thinking may be dis- 
tinguished by the greater degree of difficulty in finding this principle. 
The difficulty lies in the fact that in the creative act a principle, already 
well established in relation to certain data, must be related to markedly 
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different data. The process which “discovers” this relationship is con- 
ceived of as an interaction between conscious and unconscious processes. | 

A test of immediate memory, the Lines Test, is offered as an approach | 
to measuring this interaction. Items designed to elicit such interaction 
correlated significantly with tests of reasoning and intelligence. Items 7 
designed as controls did not. 
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CLOSURE AS AFFECTED BY VIEWING TIME 
AND MULTIPLE VISUAL FIXATIONS! 






CRAIG M. MOONEY 
Defence Research Medical Laboratories 
Toronto, Ontario 






Wuar is the role of viewing time and multiple visual fixations in effecting 
closures with incomplete graphic representations of familiar objects? 
To answer this question would require three ways of presenting closure 
items: one, where there is ample time and opportunity for multiple visual 
fixations; two, where time is brief and only a single fixation is possible; 
three, where there is ample time, but only a single fixation is possible. 
An earlier study (4) verified that in addition to the two ways already 
available—direct exposure for unrestrained inspection and_tachisto- 
scopic exposure—the third is available through the agency of negative 
after-images induced under flickering light. 

The present study utilized these three experimental techniques in 
attempting to discover whether the factors of viewing time and multiple 
visual fixations significantly affect closure. This entailed comparison 
of closure performance under conditions where the time allowed was 
ample and the nature, sequence, and number of visual fixations were 
systematically prescribed and varied. 
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METHOD 


Experimental Materials and Procedures 

: The closure test items and their manner of presentation have been described 
"elsewhere (4). In brief: 50 incomplete black and white drawings based only on 
highlights or shadows of strongly lighted photographs of the heads and faces of 
'__ miscellaneous persons constituted the closure test items; these Closure Faces, in 
' ‘their photographically positive and negative states, were projected on screens 
& before the S in three ways: one, for direct inspection; two, tachistoscopically at 
‘Speeds permitting only one visual fixation; three, as negative after-images (under 
' light flickering at three cycles per second) permitting observation for lengthy periods 
of time with but one visual fixation. The projecting apparatus was a 500-watt slide 
projector, with a variable speed light interrupter having a light-dark ratio of 50:50 
in front of it. For tachistoscopic presentation, a second light interrupter was placed 
in front of this, and so geared that five out of six of the light flashes delivered by 


1This paper is based on a thesis submitted to McGill University in partial ful- 
i filment of the requirements for the degree of Doctor of Philosophy. The work was 
done in part at McGill University, supported by a Rockefeller grant to D. O. Hebb. 
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the first interrupter could be blocked. The S faced two screens four feet away and 
viewed projected test items approximately 20 by 30 inches in size. This rectangular 
space was dimly lighted in the intervals between exposure of test items, and S 
regarded the centre of it. The first screen, in front of the projector, was used 
to display test items; S, seated to right of projector, viewed these at an angle of 
about 20°; they subtended a visual angle of about 18°. The second screen, directly 
in front of S, was used for induction of negative after-images. 


EXPERIMENT I 


The purpose was to ascertain whether successive visual fixations are or 
are not essentially contributory to perception of the Closure Faces. 


Method 


The S was tested with a different group of items under each of three viewing 
conditions. The items were presented directly for free inspection for 30 seconds. 
They were also presented tachistoscopically at an exposure speed of 1/12 second; if 
not positively seen on one exposure, two successive exposures were given at 11/12 
second intervals, and, if not then perceived, three successive exposures were given 
at 11/12 second intervals. Finally, the items were presented in the state of nega- 
tive after-images for inspection for 30 seconds. 

Items consisted of 36 Closure Faces and 18 false ones (similar in terms of 
graphic “stuff,” but otherwise nonsensical) randomly sorted into three groups of 
12 real and 6 false. 

A counterbalanced 3 x 3 x 3 analysis of variance design was employed, account- 
ing for the variables, item groups, viewing conditions, and order. This, replicated 
three times, called for 27 Ss. 

The Ss were randomly selected adults from the staff of a medical research estab- 
lishment (10 scientists and 17 clerks, of whom 7 were women and 20 men, ranging 
in age from 18 to 43 years, with a mean age of 30). 

Ss were told that all test items were real faces. Scores for the real items were 
the numbers of correct closures (the incompletely represented person being seen and 
convincingly described by S in terms of sex, approximate age, orientation of the head, 
location of main features, apparent expression); and, for the false items, the numbers 
erroneously “seen.” Separate, comparable analyses were made of the two sets of 
scores. 


Results 


Scores, for real items correctly seen and for false items “seen,” are 
given in Table I. The analysis of variance for real items seen revealed 
no significant differences between performance by the method of direct 
inspection and that of negative after-images; performance was somewhat 
better (.05 level of probability) by the tachistoscopic method. Im- 
provement in performance attributable to order—that is, with successive 
item groups—was (.05 level) significant. The analysis of variance for 
false faces revealed that significantly fewer (.01 level)were “seen” by 
the method of direct inspection than by the other methods; there was no 
order effect. 
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TABLE | 


PERCEPTIONS WITH THREE GROUPS OF TWELVE REAL AND SIX 
Fase CLosurE ITEMS UNDER THREE VIEWING CONDITIONS 








Method of presentation 








Order of CN ee 

Closure items presentation Direct Tachistoscopic Neg. Al Total 
Real Ist 74 83 60 217 
2nd 61 88 83 232 
3rd 87 86 86 259 
TOTAL 222 257 229 708 
POSSIBLE 324 324 324 972 
False Ist 12 25 25 62 
2nd 19 26 22 67 
3rd 14 20 21 55 
TOTAL 45 71 68 184 
POSSIBLE 162 162 162 486 


In all, 73 per cent of the real faces were correctly seen and 38 per 
cent of the false were “seen.” Of closures by the tachistoscopic method, 
48 per cent were accomplished on one exposure, 39 per cent on the 
double exposure, and 13 per cent on the triple; for false faces, 13 per 
cent were “seen” on the single exposure, 38 per cent on the double, and 
49 per cent on the triple. The method of direct inspection resulted in only 
24 per cent of the false faces being erroneously “seen” compared to 
43 per cent by each of the other two methods. 


EXPERIMENT II 


The purpose was to assess perception of the Closure Faces as a func- 
tion of tachistoscopic exposure speed and the number of exposures. 


Method 

The S was tested with different groups of items at three tachistoscopic speeds— 
1/8 second, 1/16 second, 1/24 second. The S watched the centre of the dimly 
lighted rectangular area on the screen where the test item was to appear. An 
item was exposed once at the selected speed; if not perceived, it was exposed twice 
in succession; if still not perceived, it was exposed three times in succession. The 
intervals between successive exposures were, for the above exposure speeds, respect- 
ively, 11/8 second, 11/16 second, 11/24 second. Items were projected on a dull 
black screen to minimize after-images. 

Items were 36 Closure Faces randomly sorted into three groups of 12. 

A counterbalanced 8 x 3 X 8 analysis of variance design was employed, account- 
ing for the variables—item groups, tachistoscopic speeds, and order. This, replicated 
three times, called for 27 Ss. 
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Ss were undergraduates, 9 women and 18 men, with a mean age of 20 years. 
Scores were the numbers of correct closures, and these were analysed in terms of 
the exposure speeds and the number of exposures at which they occurred. 


TABLE II 


CLosuRES EFFECTED WITH THREE GROUPS OF TWELVE ITEMS 
WITH INCREASING NUMBERS OF TACHISTOSCOPIC EXPOSURES 


Presentation 





Exposures Ist 2nd 3rd Total 
Single 85 137 166 388 
Double 62 76 72 210 
Triple 36 27 21 84 
TOTAL 183 240 259 682 


POSSIBLE 324 324 324 972 





Results 


Scores, in terms of exposures and order, are shown in Table II. The 
analysis of variance showed no effects attributable to the different 
tachistoscopic speeds. There was a significant improvement (.01 level) 
in performance over successive item groups, and this was strikingly 
related to the number of exposures. On their first item group Ss effected 
12 per cent of their total closures on a single exposure and 5 per cent 
only after three exposures; on their third item group they were effecting 
24 per cent with a single exposure and 3 per cent with the triple. Dis- 
regarding order, 57 per cent of all closures were effected on the single 
exposure, 31 per cent on the double, 12 per cent on the triple. The over- 
all accomplishment was 70 per cent. 


EXPERIMENT III 


The purpose was to assess perception of the Closure Faces as a func- 
tion of central and peripheral tachistoscopic presentation and the num- 
ber of exposures. 


Method 

The S was tested with different groups of items under two conditions. In one, 
the screen was dark except for a small dot of light projected on the centre of the 
space to be occupied by the test item; the dot blinked continuously in interludes 
between exposures with a 1/18 second flash at 11/18 second intervals. In the other 
condition there were four such dots similarly (and simultaneously) blinking in a 
12- by 16-inch rectangular pattern aligned with the four corners of the forthcoming 
test item. For central presentation S$ fixated the dot and anticipated a single 
exposure of a test item after four blinks, and its appearance again for a second, third 
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and fourth time, with four blinks interposed each time. In peripheral presentation 
items were presented in the same way with S fixating in order the upper left, upper 
right, lower left, and lower right of the four interposed dots. The S was required 
to note and state on what exposures he effected a closure and to describe it at the 
end of the sequence. 

Items were 32 Closure Faces randomly sorted into two groups of 16. 

A counterbalanced 2 x 2 x 2 analysis of variance design was employed, account- 
ing for the variables—item groups, viewing conditions, and order. This, replicated 
three times, called for 12 Ss. 

The Ss were undergraduates, 6 men and 6 women, with a mean age of 20 years. 

Scores were the numbers of correct closures, and these were analysed in terms 
of central and peripheral presentations and the exposure at which they occurred. 


TABLE III 


CLosuRES EFFECTED WITH Two Groups OF SIXTEEN ITEMS WITH SUCCESSIVE CENTRAL 
AND PERIPHERAL TACHISTOSCOPIC EXPOSURES 


Exposures 
Presentation Ist 2nd 3rd 4th Total Possible 
Peripheral 24 32 28 47 131 192 
Central 31 45 25 39 140 192 
53 


TOTAL 55 77 86 271 384 


Results 

Scores, for the peripheral and central methods of presentation, broken 
down by the exposure on which closure occurred, are shown in Table III. 
The analysis revealed no single significant source of variance except— 
for each method—superior performance (.05 level) on the second and 
fourth exposures. The over-all accomplishment was 71 per cent. 


EXPERIMENT IV 


The purpose was to ascertain if closure performance was essentially 
dependent on the number of exposures. 


Method 

The method of central tachistoscopic presentation used in Experiment III was 
employed, with 4 similar design, and the same materials. The conditions were one 
versus two exposures. Eight Ss were used. 


Results 


Results, shown in Table IV, reveal that as many closures were effected 
on one as on two exposures; and that the over-all accomplishment was 
68 per cent. 
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TABLE IV 


CLosurEs EFFECTED WITH Two Groups OF SIXTEEN ITEMS 
WITH ONE AND Two TACHISTOSCOPIC EXPOSURES 


— 





Exposures 
Item groups Order One Two Total 

Group 1 Ist 22 16 38 
2nd 23 25 48 

Group 2 ist 24 23 47 
2nd 17 25 42 

TOTAL 86 89 175 
POSSIBLE 128 128 256 





SUMMARY OF RESULTs 


With ample time allowed, the perceptual performance was equally 
effective whether multiple visual fixations were permitted or observation 
was limited to a single, fixed point of regard. 

With observation limited to a single, fixed point of regard, the per- 
ceptual performance was equally effective whether ample time was 
afforded or but a fraction of a second. 

When only brief observations were permitted the perceptual perform- 
ance was not improved by a succession of these; nor did it matter 
whether fixation points were prescribed or not, or whether fixations 
were central or peripheral. 

The method of direct inspection was superior to the other methods 


of viewing in only one respect: significantly fewer of the false items were 
mistakenly seen. 


Discussion 


Evidently perception of the Closure Faces is not essentially dependent 
on or facilitated by prolonged inspection or a sequence of scanning eye 
movements. Closures occur fortuitously and apparently instantaneously 
at a single glance or with a single fixation. This finding invites explana- 
tion in view of the importance attached to scanning eye movements in 
the perceptual process. 

Hebb (2, pp. 32-34) has proposed that perceptual learning “depends 
originally on multiple visual fixations” and that, thereafter, perception of 
a simple or familiar figure “is definitely clearer, more effective with 
them than without.” Gibson (1, p. 57) has submitted “that the visual 
world is dependent on eye movements and is not seen as the result of 
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a single fixation or a momentary visual field.” Lashley (3, p. 432) 
has remarked that “one perceives very little with an exposure too brief 
to permit of eye movements and most of our perception of objects is 
derived from a succession of scanning movements, the succession of 
retinal images being translated into a single impression of form.” 

Such propositions would lead one to suppose that, when perceptual 
arrests occur because of the partial representation of familiar objects, 
resort to scanning eye movements should prove efficacious. The logic 
might be that this would be a natural regression, likely to reconstitute 
elements which had been earlier constitutive of the percept; or that 
perceptual resolution would be promoted by the forcing or reinforcing 
of serial incorporative processes through the agency of multiple visual 
fixations. Thus, in the present experiments, it might be expected that 
the method of direct inspection would prove superior to the other two 
methods. 

However, direct inspection did not prove superior, except in the non- 
perception of the false items. One must reason, therefore, that while 
eye movements occurred they were unavailing and gratuitous. This is 
to say that the graphic representations (these Closure Faces) gained 
nothing as perceptual stimuli by being inspected, but operated as 
maximally valent wholes from the outset; that the central incorporative 
happenings (whatever and however they might be) which were involved 
in the perceptual events proceeded solely from these total “givens” and 
were not further facilitated by scanning eye movements. 

This suggests that while there is a supplementary role for eye move- 
ments in the clarification and identification of the explicit elements of 
complex presentations (in proofreading, for example, or the discovery 
of configural or contextual anomalies), the perception of familiar ob- 
jects does not necessarily entail or require such visual elucidation. 
When—as with these Closure Faces—it is precluded by excising all 
redundant material, there is no other role for scanning eye movements; 
they are not involved in the perceptual events; single glances suffice. 
And it would appear that, for this, the elements of the stimulus complex 
subscribe all together to each perception by virtue of their formal con- 
gruence with the implied whole object which they partially represent. 


SUMMARY 


This study was concerned with the role of time and multiple visual 
fixations in the perception of incomplete representations of particular 
kinds of human heads and faces. 

Closures were elicited under three contrasting viewing conditions: 
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one, where time was ample and opportunity for multiple visual fixations 
was afforded; two, where time was ample but only one fixation was 
permitted; three, where but a brief, single glimpse was afforded. @ 

Perceptual performance did not differ under these diverse conditions | 
and it was concluded that viewing time and scanning eye movements ~ 


are neither essential nor contributory to the perception of familiar 
objects. 
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FREE ASSOCIATION TIME AS A FUNCTION OF WORD 
FREQUENCY 


JOHN F. HALL anp ALVIN UGELOW 
Pennsylvania State University 


THE worD association test, which has had a long history in the assess- 
ment of guilt, conflict and maladjustment, would also seem to hold 
promise as an indicator of an individual's verbal habit structure, know- 
ledge of which would be helpful in the prediction of language behaviour. 
From this point of view, the use of the conceptual framework which 
Woodworth and Schlosberg (11) have proposed (association time as a 
function of stimulus and organismic variables) is particularly appropriate. 

In the examination of organismic variables, Wispe (10) has shown the 
importance of physiological needs, and Cantril (1) and McGinnies (7) 
the importance of value systems, in determining associative times or 
responses. 

Stimulus variables have not been extensively investigated, although 
one would predict that, inasmuch as stimulus frequency or familiarity has 
been found to play an important part in perception (5) and learning 
(4), it would be similarly important in the determination of associative 
reaction time. In fact, Cofer and Shevitz (3), using high and low fre- 
quency words from the Thorndike-Lorge word count (9), did find a 
relationship between the frequency of the stimulus word and the number 
of associations which that word produced within a ten-minute period. 
This technique, however, is so similar to continuous free association 
that its use invariably raises the question whether the associations pro- 
duced are a function of the stimulus word or of the previous response. It 
would appear desirable to extend the Cofer and Shevitz (3) results to 
the discrete free association situation, using association times, rather than 
number of associations, as the response measure. 


EXPERIMENTAL PROCEDURE 
Subjects 
Forty students (25 males and 15 females) were recruited from an introductory 
psychology class at Pennsylvania State University. 
Materials 


The construction of the word list was as follows. Forty-eight five-letter words were 
selected from the Thorndike-Lorge Teacher's Word Book of 30,000 Words. Twenty- 
four had a low frequency count (1 to 4 occurrences per million words) while the 
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other twenty-four had a high frequency count (100 or more occurrences per million), 
In order to obtain some generality with respect to parts of speech, eight nouns, eight 
adjectives, and eight verbs were included in each frequency. 

The words used in each grouping were selected as follows. A table of random 


numbers was consulted to provide a list of page numbers ranging from 1-208 (the 7 
number of pages in the Word Book). Turning to the page indicated by the first | 


random number, the first word was chosen which had the required frequency, 
consisted of five letters, and was the part of speech desired. If the page provided no 
word with the necessary characteristics, succeeding pages in the Word Book were 


examined until such a word was found. This process was repeated until the 48 words | 


to be used in the experiment were obtained. High frequency nouns, adjectives, and 
verbs, and low frequency nouns, adjectives, and verbs were selected in that order. 


Apparatus 

The Gerbrands Electronic Voice Key, to which was attached a Standard Electric 
Timer, was used in obtaining the free association times. Times were measured in 
hundredths of seconds. 


Procedure 


After the S had reported to the experimental room, he was comfortably seated at a 
small table with a microphone in front of him, and the E read a short paragraph 
indicating the nature of the task. Five to ten stimulus words were then read from 
which free association responses were obtained. This made S better acquainted with 
the task, and also enabled E to adjust the voice key so that S’s normal manner of 
responding would stop the chronoscope. The 48 test words were then read, and his 
responses and response times recorded. 


RESULTS AND DISCUSSION 


Out of a possible 1,920 responses from the 40 subjects, 228 were lost 
because of (1) failure of S to respond loudly enough to stop the chrono- 
scope, or (2) E speaking too loudly, which not only started the chrono- 
scope but stopped it as well. Analysis of these non-timed responses re- 
vealed that 118 of them occurred with high frequency stimulus words, 
and 110 with low frequency. 

A further 136 responses were lost because $ blocked to the stimulus 
word. To avoid embarrassing him by waiting too long while he groped for 
a response, E stopped the clock when S appeared uncomfortable and 
went on to the next word. On other occasions, § would state, “I just can’t 
think of a word.” This verbal response, of course, stopped the clock. 
Since no significant response was made in either of these cases, the 
latencies recorded were not included in the latency analysis, but were 
dealt with separately. The latency analysis is based, then, upon a total ot 
1,556 responses obtained from 40 subjects. 

The statistical analysis was based on the mean latencies to the high and 
low frequency words, calculated separately for each S and then averaged. 
(Although the distribution of free association times for a single S is quite 
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skewed, a normal distribution is approximated when mean latencies for 
each S are plotted.) 
The mean latencies, in hundredths of seconds, were: 


High frequency words 179.16 
Low frequency words 256.51 


The difference, using the difference method, was highly significant 
(t=7.89, p<.01). The consistency of this finding is attested by the fact 
that 38 of the 40 subjects gave a shorter mean association time to the high 
frequency words than to the low frequency. 

The distribution of blocking gives further evidence of the relation 
between longer association times and low frequency words. The mean 
number of blocks for all Ss to high and low frequency words was: 


High frequency words 18 
Low frequency words 3.08 


This difference, using the difference method, is again highly significant 
(t = 7.57, p<.01). No S had more blocks to the high frequency words 
than to the low. 

These findings extend the conclusions of Cofer and Shevitz (3) to the 
discrete free association situation, and also emphasize the need for 
equating frequency of word count in word association experiments. 


STIMULUS FREQUENCY AND RESPONSE COMMUNALITY 


The early study of Thumb and Marbe (quoted by Woodworth (11) ) 
and its later confirmations (2, 8) indicated that the more common a 
response was (“response communality”), the shorter was the association 
time. Since we have shown that short association time is also related 
to frequency of usage of the stimulus word, one would assume a relation- 
ship between frequency of usage and communality of response. Speci- 
fically, one would predict that high frequency stimulus words should 
produce greater communality (ie., smaller range) of responses than 
low frequency stimulus words. Our data provide an empirical test of this 
prediction. 

The number of different responses to each stimulus word was com- 
puted and divided by the total number of responses to that word.’ The 
lower the resulting value, the higher the communality of the responses, 
that is, the more they are concentrated among relatively few response 
words. To the stimulus word “early,” for example, only five different 


1In determining the number of different responses, a response which was plural in 
form was considered to be the same as the singular. 
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responses were made by the 40 subjects, while to the word “natal” nine 
subjects failed to respond and 20 of the remaining 31 responses were 
different. Hence the communality value assigned to “early” was .125, 
and that to “natal” .645. 


The mean values for the high and low frequency stimulus words were: 


High frequency words 381 
Low frequency words .642 


The difference is significant (t = 4.62, p<.01) and supports the hypo- 
thesized relationship, at least for the sample of subjects used. 


SUMMARY 


The study was designed to determine the relationship between 
frequency of the stimulus word, as indicated by the Thorndike-Lorge 
frequency count, and free association time. Analysis of the free association 
times to 48 words for 40 Ss revealed that latency was significantly shorter 
to high frequency words than to low. An analysis of the blocks made to 
the high and low frequency stimulus words provided confirming evidence. 
Examination of the response words revealed that frequency of stimulus 
word was also related to communality of response. 
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UNE METHODE RATIONNELLE DE LOCALISATION DES TESTS 
DANS LES ECHELLES D’AGE! 


MONIQUE LAURENDEAU Er ADRIEN PINARD 
Université de Montréal 


La CONSTRUCTION d'une échelle de développement mental, actuellement 
en cours a |’Institut de Psychologie de l'Université de Montréal,? permet 
de remettre en question un probléme difficile qui a fait 'objet de maintes 
discussions dans le passé, mais qui ne semble pas encore avoir regu de 
solution satisfaisante. Ce probléme concerne le choix et la localisation des 
tests, pour chacun des niveaux d’age, dans une échelle d’Age mental 
(genre Binet-Simon ou Terman-Merrill). I] semble régner, autour de 
cette question, un climat de confusion ou de contradiction qui rend 
difficiles la compréhension du rationnel de la technique et son applica- 
tion & des cas particuliers. 

Les diverses solutions proposées sont ou bien strictement empiriques (2, 
6, 14, 17, 18) sans pouvoir se préter 4 une véritable justification logique, 
ou bien strictement rationnelles (7, 8, 11, 12, 13, 16) sans jamais aboutir 
4 Pétablissement d'une critére précis qui puisse guider celui qui veut 
construire un test. En conséquence, les techniques appliquées jusqu’a 
maintenant au choix et a la localisation des tests sont déja imprécises au 
départ et dégénérent presque toujours en tatonnement. Le plus souvent 
(14, 17, 18) on choisira, parmi les tests qu’on a construits, ceux qui 
sont de mieux en mieux réussis aux Ages successifs, sans préciser si cette 
augmentation graduelle doit étre réguliére (v.g. 15 pour cent a 6 ans, 
30 pour cent 4 7 ans, 45 pour cent a 8 ans, 60 pour cent a 9 ans, 75 pour 
cent 4 10 ans, etc.) ou si elle doit étre marquée d'un écart brusque au 
moins 4 un Age particulier (v.g. 15 pour cent a 6 ans, 20 pour cent a 7 ans, 
30 pour cent 4 8 ans, 65 pour cent a 9 ans, 80 pour cent a 10 ans, etc.). 
On acceptera en théorie que, pour pouvoir assigner un probléme a un 
age donné, le pourcentage de réussite doit atteindre de 60 a 75, alors 
qu'il variera en fait de 45 4 85, et méme au dela de ces limites (14). Dans 
d’autres cas (2, 6, 15), on exigera qu’au moins 75 pour cent des sujets 
d'un Age donné réussissent un probleme pour pouvoir dire qu'il est 
typique de cet Age, mais l’on ne tiendra aucun compte du pourcentage de 


‘A summary in English follows the article. 
2Ce travail de recherche, commencé depuis un peu plus de trois ans, a été rendu 
possible grace 4 des octrois du Ministére de la Santé (entente Fédérale-Provinciale ). 
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réussite trouvé aux Ages antérieurs. C’est ainsi que, a ]'4ge juste précédent, 
ce pourcentage pourra étre indifféremment de 30 ou de 70. 

Ces diverses techniques ne sauraient donner que des résultats incon- 
sistants. En général, la moyenne et la dispersion des quotients intel- 
lectuels obtenus au moyen des instruments ainsi construits varient 
beaucoup d’un Age a l’autre. II s’ensuit que deux examens subis par un 
méme sujet 4 quelques années d'intervalle peuvent accuser des différ- 
ences trés considérables, sans qu’on puisse en conclure nécessairement a 
un changement réel dans les aptitudes du sujet. Une étude de Good- 
enough (9) sur le test de Terman le démontre a l’évidence en soulignant 
que la mauvaise localisation des problémes explique cette trop grande 
variabilité. Un autre inconvénient, résultant aussi des imperfections de 
la technique employée, tient au fait qu'un méme résultat au test peut 
caractériser des performances trés différentes. Deux enfants, par exemple, 
obtiendront six ans d’4ge mental; mais alors que le premier aura réussi 


tous les tests de cet 4ge et manqué ceux des Ages supérieurs, le second | 


aura pu compenser l’échec de problémes situés bien en dega de l’age de 
six ans par la réussite d’autres situés bien au dela de cet Age. L’incon- 
sistance de ces résultats proviendrait, semble-t-il, de deux causes 
principales: le niveau moyen de réussite 4 chaque Age est fixé de fagon 
trop arbitraire et la valeur discriminative des problémes ne regoit pas 
l’'attention qui conviendrait. 


NIVEAU MOYEN DE REUSSITE 


La variation observée dans le niveau moyen de réussite des tests 
attribués 4 chaque niveau d’4ge ne semble répondre a aucun principe 
rigoureux. A partir des chiffres fournis par McNemar (14, pp. 89-98), il 


TABLEAU I 


POURCENTAGES MOYENS DE REUSSITE OBSERVES SUR L’ENSEMBLE DES TESTS ASSIGNES 
A CHAQUE NIVEAU D’AGE DANS L’ECHELLE DE TERMAN-MERRILL 





Pourcentages de réussite Pourcentages de réussite 
Age —_ Age sO 
(en années) Forme L Forme M (en années) Forme L Forme M 

2 75 74 7 61 63 
2-6 78 73 8 61 65 
3 70 71 9 58 60 
3-6 71 73 10 60 60 
4 70 74 ll 57 59 
4-6 70 74 12 65 64 
5 72 70 13 65 64 
63 60 


6 72 76 14 
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est possible de calculer les pourcentages moyens de réussite sur les tests 
retenus 4 chaque niveau d’age dans la revision du Stanford-Binet faite 
par Terman-Merrill. Le tableau I rapporte ces pourcentages. Comme on 
peut le constater, le niveau moyen de réussite a tendance a baisser d'un 
ge 4 autre, sans qu’on puisse toutefois en dégager une régle ou un 
principe rigoureux. Comme Ilont dailleurs fort justement remarqué 
Jaspen (18) et Doll (7), on doit s’attendre 4 ce que les pourcentages de 
réussite soient toujours supérieurs 4 50 et relativement plus éléves chez 
les enfants jeunes que chez les plus 4gés. Ce phénoméne s’expliquerait 
par la différenciation progressive du fonctionnement intellectuel: a 
mesure que l'enfant se développe, il apprend de plus en plus de choses et 
cest pourquoi on peut trouver beaucoup plus de différences entre deux 
enfants 4 douze ans qu’ deux ans. En d’autres termes, les habiletés 
mentales d’un enfant jeune étant beaucoup moins variées ou moins 
différenciées que celles d’un enfant plus Agé, les ressemblances seront 
plus grandes aux Ages inférieurs qu’aux Ages supérieurs. 

Si cela est vrai, on devrait pouvoir trouver une technique qui permette 
de déterminer le pourcentage idéal de réussite pour chacun des niveaux 
d'age et l'on devrait s’en tenir le plus possible 4 ces pourcentages en 
assignant les tests 4 tel ou tel niveau. Car, a notre avis, il est de toute 
premiére importance de bien situer chaque test 4 un Age ou il caractérise 
une réelle acquisition dans le développement mental, si l'on veut éviter 
de sous-estimer ou de surestimer les habiletés propres 4 chaque niveau 
d’age et si ’on veut aussi faciliter ’administration de l’épreuve. 

A ce propos, il peut étre opportun de relever ici les arguments apportés 
par McNemar (14, p. 87) en réponse aux critiques faites sur la localisa- 
tion des tests dans la revision de Terman. Son premier argument énonce 
simplement l’impossibilité d’obtenir 4 chaque Age un quotient intellectuel 
moyen de 100 en choisissant 50 pour cent comme critére de réussite. 
Garrett (8) soulignait déja, sans d’ailleurs proposer lui-méme un critére 
différent, qu’un tel argument se réduit 4 une simple constatation de fait et 
n’explique vraiment rien. Ce qui nous parait plus singulier encore, c’est 
que McNemar commence par nier la possibilité pratique d’arriver 4 des 
pourcentages moyens de 50 pour ensuite en admettre quand méme la 
nécessité théorique en alléguant, dans un troisiéme argument, que tous les 
enfants ont malgré tout l’occasion de subir des tests de 50 pour cent de 
difficulté. Son deuxiéme argument, d’autre part, nous semble contra- 
dictoire. En affirmant que l’assignation des tests 4 certains niveaux d’Age 
est une affaire de pure commodité et n’a pour but que d’en simplifier 
ladministration et la correction, il parait vouloir infirmer l'importance 
théorique d’une localisation exacte. Etant donné que les sujets passent 
des tests assignés A différents niveaux d’Age et regoivent des crédits pour 
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toutes leurs réussites, McNemar laisse entendre qu'une localisation ap- 
proximative peut étre suffisante. 

Or, en invoquant les mémes raisons que lui, il nous semble au contraire 
essentiel d’assigner les sous-tests 4 leur Age véritable. En effet, si l'on 
veut réellement faciliter l’administration d'une échelle d’Age, il faut 
éviter d’avoir 4 faire subir au sujet les tests de tous les niveaux. C'est 
pour cette raison que, en pratique, on choisit toujours comme point de 


départ de examen l’ge auquel le sujet réussit tous les tests (Age de — 


base) et, comme point d’arrét, celui auquel il les manque tous. Dans ces 
conditions, la localisation exacte des tests prend une importance théorique 
considérable puisque, si l'on a placé a certains Ages des tests trop faciles 
ou trop difficiles, on ne voit pas comment on peut arriver, dans l'examen 
d'un sujet, 4 déterminer un point de départ et un point d’arrét qui cor- 


respondent vraiment au réel. D’autre part, une localisation imprécise | 


enléve au sujet la possibilité d’enregistrer des succés et des échecs sur 


des tests qu'il n’essaiera méme pas, tout simplement parce que l’examen | 


commence 4 l’age ou il réussit tous les problémes et se termine a l’age 


ou il les manque tous. Or comme, dans la pratique, on ne dépasse jamais | 


ces deux limites, les résultats obtenus pourront étre ou trop forts ou trop 


faibles selon le cas. On peut relire, 4 ce propos, les travaux de Berger et i 
Speevack (3, 4) sur le rendement des enfants arriérés au test de Terman. © 


En somme, le désir fort estimable de simplifier ’administration d'un test 





devrait pouvoir se réaliser sans risquer de compromettre la validité des 


résultats. 

On pourrait toujours tenter de réfuter ces énoncés en apportant le fait 
que la moyenne des quotients intellectuels trouvés a partir des échelles 
d’age ainsi construites se situe aux environs de 100 4 tous les niveaux 
d’age et que, par conséquent, la localisation des tests doit étre suffisam- 
ment exacte. Si, en effet, les enfants de cing ans obtiennent en moyenne 
cing ans d'age mental, on devrait pouvoir conclure que les tests réussis 
par ces enfants caractérisent bien cet Age. Mais on sait que les moyennes 
calculées dans la revision de Terman et Merrill, par exemple, s’éloignent 
assez souvent de 100: dans la forme L de ce test, elles s'échelonnent en 
fait de 100.9 a 109.9 (chiffres bruts). On a d’abord été tenté d’attribuer 
ces différences 4 des erreurs d’échantillonnage (18); mais certains travaux 
(1, 5) ont pu montrer ensuite que ces variations, d’ailleurs significatives 
au plan statistique, seraient attribuables avant tout 4 des caractéristiques 
internes du test. Méme a supposer que le quotient intellectuel moyen se 
situe aux environs immédiats de 100 a tous les Ages, il ne faudrait pas 
oublier que, dans ces échelles, le méme Age mental peut sobtenir par 
des combinaisons d’échecs et de succés trés diverses et parfois singuliéres. 
En d'autres termes, aussi paradoxal que cela puisse paraitre, c'est peut- 
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étre uniquement parce que I’échelle ne répond pas aux conditions de 
transitivité et d’asymétrie qu'on peut arriver 4 une certaine consistance. 
Quoi qu'il en soit, on ne peut que s'inquiéter des imperfections de la 
technique appliquée au choix et a la localisation des tests dans les 
échelles d’4ge, et notamment dans les deux principales revisions du test 
de Binet (17, 18). Il s’agit toujours d’un procédé de tAétonnement qui 
consiste 4 rassembler et 4 réassembler les tests de différentes fagons 
jusqu’a ce que le but (quotient intellectuel moyen de 100) soit 4 peu 
prés atteint. Une technique plus rationnelle et peut-étre plus valable 
devrait pouvoir remplacer ce procédé un peu trop primitif. 


VALEUR DISCRIMINATIVE 


Le second facteur d’importance a signaler concerne la valeur dis- 
criminative des problémes. I] semble qu’on ne tienne pas suffisamment 
compte des pourcentages de réussite obtenus aux Ages précédant ou 
suivant celui auquel un probléme est assigné. Comparons, par exemple, 
la distribution des pourcentages de réussite obtenue a cinq niveaux d’age 
sur les deux item suivants empruntés 4 Terman (17, p. 89): 





1-6 ans 2 ans 2-6 ans 3 ans 3-6 ans 
Item L,II-6,4 0 23 a 97 
Item L,II-6,1 10 52 77 89 95 


A notre avis, il existe une différence considérable entre ces deux item 
pourtant assignés tous les deux 4 2-6 ans. Dans le premier cas, on peut 
dire que lhabileté en question s’acquiert de fagon assez soudaine a l’age 
de 2-6 ans et que, par conséquent, la réussite de cet item caractérise bien 
cet Age particulier puisque la grande majorité des enfants moins 4gés le 
manquent. Dans le second cas, au contraire, la distribution des pour- 
centages présente beaucoup plus de continuité. Méme si le probléme est 
de mieux en mieux réussi aux Ages successifs, l’assignation de cet item a 
un age particulier est trés aléatoire et, de toutes fagons, on ne voit pas trés 
bien pourquoi c’est un item de 2-6 ans, étant donné que déja 52 pour cent 
des enfants de deux ans le réussissent. Si, dans une échelle d’4ge, le but 
est d’arriver 4 rassembler des problémes réellement caractéristiques des 
ages étudiés, il semble raisonnable de chercher, toutes choses égales 
ailleurs, des problémes qui donnent lieu 4 une augmentation brusque 
des réussites a lage ou ils sont localisés et qui, en méme temps, n‘offrent 
pas une trop grande dispersion d’ages entre l’échec total et le succés 
parfait. L’augmentation des réussites doit d’ailleurs étre d’autant plus 
brusque et la dispersion des 4ges compris entre l’échec total et le succés 
parfait d’autant moins grande que les enfants sont plus jeunes, si l’on 
accepte que le développement mental est beaucoup plus rapide aux ages 
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inférieurs. Or c'est la une observation dont la plupart des spécialistes 
(7, 10) s’'accordent 4 reconnaitre l’exactitude, méme s'il y a mésentente 
sur la forme précise de la courbe qui traduit ce développement. 

Dans ces conditions, on devrait pouvoir trouver une technique qui 
permette de déterminer les pourcentages idéals de réussite pour chacun 
des niveaux d’age précédant ou suivant celui od se situe un probléme, 
Une fois ces pourcentages déterminés, il est de toute nécessité, 4 notre 
avis, de s’'y conformer le plus possible au moment de choisir les tests 4 
conserver dans une échelle, si l'on veut en assurer la valeur discriminative. 
On risquerait autrement d’obtenir des moyennes inconsistantes aux 
différents Ages et d’étendre artificiellement la dispersion des résultats. 

Considerons, par exemple, les sept item que Terman assigne a l’dge 
de 4 ans (forme L). Ces problémes sont réussis en moyenne par 10 pour 
cent des sujets de 2-6 ans, 24 pour cent des sujets de 3 ans, 56 pour cent 
des sujets de 3-6 ans et 70 pour cent des sujets de 4 ans. En placant ces 
item a 4 ans, on admet par le fait méme que tous ces sujets ont au moins 
3-7 ans d’dge mental, puisqu’ils sont capables de réussir un ou plusieurs 
des tests de 4 ans. Or, si l'on traduit ces résultats en termes de quotient 
intellectuel, cela revient 4 dire que 10 pour cent des sujets de 2-6 ans 
ont au moins 143, 24 pour cent des sujets.de 3 ans au moins 119, 56 pour 
cent des sujets de 3-6 ans au moins 102 et 70 pour cent des sujets de 4 ans 
au moins 90, ce qui contredit singuliérement les données de la courbe 
normale. En réalité, comme on le verra plus loin, ces réussites moyennes 
devraient se situer plutét aux environs de 0 pour cent a 2-6 ans, 5 pour 
cent 4 3 ans, 30 pour cent 4 3-6 ans et 64 pour cent a 4 ans pour que les 
résultats se conforment plus exactement aux proportions de la courbe 
normale. 

On peut se demander pourquoi un tel phénoméne ne se répéte pas plus 
fréquemment dans léchelle de Terman étant donné que, sur l'ensemble 
des item assignés a certains Ages, le pourcentage moyen de réussite observé 
aux Ages précédents dépasse de beaucoup ces proportions. Déja a lage 
juste précédent, par exemple, ce pourcentage est souvent supérieur a 45 
et peut méme atteindre 58 ou 59. La seule explication possible tient au fait 
que la réussite des tests d'un niveau niimplique pas nécessairement la 
réussite de ceux des niveaux précédents, Soulignons que cette incon- 
sistance n’existe pas seulement au plan des réussites individuelles, mais 
méme au plan des réussites collectives. Aux 4ges supérieurs, par exemple, 
les tests ont A peu prés la méme difficulté moyenne a lage ot ils sont 
situés et aux Ages adjacents. L’examen plus détaillé de chacun des pro- 
blémes est encore plus révélateur. Si l'on compare la réussite relative de 
deux tests situés 4 un méme niveau d’4ge, on observe parfois que l'un des 
deux s’avére plus facile pour les sujets de l’age précédent que l'autre ne 











— 


Sal 


‘= we \e 











| 
| 


CORES ST 


1957] LOCALISATION DES TESTS 39 


Test pour les sujets de l’Age anquel il est situé. Or ces anomalies ne de- 
vraient pas se retrouver dans un instrument de mesure satisfaisant aux 
conditions normales de transitivité et d’asymétrie. La réalisation parfaite 
de ces conditions demeure sans doute un idéal impossible 4 atteindre dans 
la pratique, surtout si on veut qu‘elles s’'appliquent non seulement aux 
performances collectives mais aussi aux performances individuelles. Il 
reste cependant que seul un contrédle nuancé de la valeur discriminative 
des item peut réduire le trop grand nombre de cas, si difficiles 4 justifier 
logiquement, ou l'on voit un sujet réussir des item difficiles aprés en avoir 
manqué de plus faciles. — 

Au terme de cette longue discussion, une conclusion simpose. Les pour- 
centages de réussite observés 4 plusieurs 4ges successifs devraient servir 
de premier critére dans le choix et la localisation des sous-tests 4 con- 
server dans une échelle d’4ge. Il semble que ce soit le seul moyen 
@arriver, sans tatonnement, 4 rassembler des problémes valables et sus- 
ceptibles de fournir des résultats consistants. 


EXPOSE D'UNE METHODE RATIONNELLE 


Le probléme est de déterminer les pourcentages idéals de réussite qui 
serviront au choix et a la localisation des meilleurs tests 4 retenir dans 
échelle finale. Il existe probablement plusieurs techniques applicables au 
calcul de ces pourcentages. La méthode suggérée ici a ’avantage de ne pas 
exiger de manipulations statistiques compliquées, tout en permettant 
darriver 4 des résultats suffisamment précis. Elle établit directement les 
pourcentages idéals de réussite, non seulement au niveau d’4ge ot doivent 
se situer les problémes, mais 4 tous les niveaux d’4ge précédents et 
suivants. 

La méthode suppose un minimum de postulats. A vrai dire, il suffit 
d'assumer que lhabileté mentale se distribue normalement dans la popu- 
lation générale. C’est un postulat qu’on retrouve constamment a la base 
de la plus grande partie des tests d’intelligence. Ce n’est point ici 
lendroit d’en discuter la valeur. Soulignons simpiement qu’on (19) a 
déja prétendu en avoir fait la démonstration expérimentale et que, de 
toutes fagons, il est pratiquement impossible de construire un test 
dintelligence sans recourir 4 ce postulat 4 l'une ou 4 l'autre des phases 
du travail (analyse des item, choix des sous-tests 4 conserver, étude de la 
validité, etc. ). 

En termes de quotients intellectuels, on peut donc supposer que la 
moyenne se situe 4 100, 4 chaque niveau d’4ge, dans une population non- 
sélectionnée. I] reste 4 déterminer l’erreur probable de cette distribution. 
Le choix de cette erreur probable est avant tout affaire de convention. 
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Il s'agit d’en fixer une qui permette une discrimination suffisante, qui se 
préte a une interprétation facile et assure une dispersion analogue a 
celles qu’on retrouve habituellement dans les autres tests d’intelligence. 
Il semble qu'une erreur probable de 10 (correspondant par conséquent 
a un sigma de 14.83) réponde assez bien a ces exigences. II est bien 
entendu qu'on aurait pu faire un choix légérement différent sans affecter 
en aucune fagon le rationnel de la technique et sans en modifier sensible- 
ment les résultats. Une fois ces paramétres déterminés, les tables de 
probabilité permettent de calculer facilement la proportion de la popula- 


TABLEAU II 
REPARTITION NORMALE DE LA POPULATION, EN POURCENTAGES CUMULES, SUR UNE 
ECHELLE DE QUOTIENTS INTELLECTUELS 
(moyenne = 100; erreur probable = 10) 








Ol % QL. % Q.1. % QI. % Q.I. % 





60 35 76 5.27 92 29.47 108 70.53 124 94.73 
61 .43 77 6.04 93 31.84 109 72.81 125 95.41 
62 .52 78 6.89 94 34.29 110 75.00 126 96.03 
63 .63 79 7.83 95 36.80 111 77.09 127 96 . 57 
64 .76 80 8.87 96 39.37 112 79.09 128 97.05 
65 91 81 10.00 97 41.98 113 80.97 129 97.48 


66 1.09 82 11.24 98 44.63 114 82.75 130 97.85 
67 ~=—-:11.30 83 12.58 99 47.31 115 84.42 131 98.17 
68 1.54 84 14.03 100 50.00 116 85.97 132 98.46 
69 1.83 85 15.58 101 52.69 117 87.42 133 98.70 
70 32.15 86 17.25 102 55 .37 118 88.76 134 98.91 
71 =2.52 87 19.03 103 58.02 119 90.00 135 99.09 
72 «62.95 88 20.91 104 60.63 120 91.13 136 99.24 
73 -3..48 89 22.91 105 63.20 121 92.17 137 99.37 
74 «3.97 90 25.00 106 65.71 122 93.11 138 99.48 
75 = =4.59 91 27.19 107 68.16 123 93 .96 139 99 .57 





tion générale qui se trouve en dessous de chacun des quotients intel- 
lectuels possibles. Le tableau II reproduit ces valeurs. Ainsi, par exemple, 
a un quotient intellectuel de 75 correspond une proportion de 4.59, ce 
qui veut dire que seulement 4.59 pour cent des sujets, dans la population, 
ont un quotient intellectuel de 75 ou moins. 

Aprés l’établissement de ces données de base, |’essentiel de la technique 
reste encore a élaborer. I] s’agit de calculer les pourcentages de réussite 
auxquels doivent donner lieu les tests situés 4 chacun des Ages pour se 
conformer 4 ces proportions, c’est-a-dire pour que seulement 2.15 pour 
ent des sujets de chaque Age aient un quotient intellectuel de 70 ou 
moins, 8.87 pour cent un quotient intellectuel de 80 ou moins, et ainsi de 
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suite. Or, pour faire ce calcul, il est absolument nécessaire de tenir compte 
de la structure de l’échelle qu’on désire construire et, principalement, du 
nombre de tests qui sont rassemblés 4 chacun des niveaux d’dge. Incidem- 
ment, notons que Jaspen (13) semble avoir été le premier 4 souligner 
cette nécessité. Goodenough et Guilford, dans leurs travaux subséquents 
(10, 12), n’ont pas manqué de reconnaitre l'importance de ce facteur. 
C’est du nombre de tests, en effet, que dépend directement la sensibilité 
de linstrument et c’est cette finesse discriminative qui détermine 4 son 
tour le pourcentage optimum de réussite de chacun des tests. 

Un exemple peut servir a illustrer ce raisonnement. Soit une échelle 
d'age qui réponde parfaitement aux conditions de transitivité et 
d’asymétrie et qui contienne seulement un test 4 chacun des niveaux 
échelonnés par 6 mois d’4ge chronologique (1-6 ans, 2 ans, 2-6 ans, 
3 ans, etc.). Dans une échelle d’age ainsi construite, i] est évident que 
la réussite d’un probléme situé, par exemple, 4 2-6 ans donne auto- 
matiquement un quotient intellectuel de 100 au sujet de 2-6 ans. 
Liéchec d’un tel probléme par le méme sujet réduit ipso facto son 
quotient intellectuel au moins a 80, alors que la réussite de l’item assigné 
a 3 ans augmente le quotient du sujet 4 120. Or comme, dans la popula- 
tion générale, il ne devrait pas y avoir plus de 50 pour cent des sujets de 
chaque age a obtenir un quotient intellectuel de 100 ou plus, les pro- 
blémes contenus dans une échelle de ce type ne devraient pas étre réussis 
par plus de 50 pour cent des enfants de l’dge ou ils sont localisés. Et c’est 
la sans doute le raisonnement qui améne la plupart de ceux qui ont 
discuté ce probléme (8, 11, 16) 4 choisir 50 pour cent comme critére 
idéal de réussite. 

Supposons maintenant une échelle d’age qui réponde, elle aussi, aux 
conditions de transitivité et d’asymétrie, mais qui contienne six tests a 
chacun des niveaux, encore échelonnés par six mois d’Age chronologique. 
Dans une échelle de ce genre, la réussite de chacun des tests ne donne 
qu'un crédit partiel (un mois). La multiplicité des problémes, 4 chaque 
niveau d’age, rend alors possible une distribution beaucoup plus nuancée 
des quotients intellectuels. Une échelle ainsi construite ne se contente 
plus de faire une simple dichotomie de sujets (ceux qui ont 100 ou plus 
de quotient vs ceux qui ont moins de 100); mais, parmi les sujets qui ne 
réussissent pas parfaitement tous les item d’un Age donné, elle distingue 
les plus retardés de ceux qui le sont moins. Or cette proportion de sujets 
qui réussissent partiellement les tests de leur Age doit s’ajouter a celle 
(50 pour cent) de ceux qui réussissent tous les tests et qui obtiennent, de 
ce fait, un quotient intellectuel de 100 ou plus. Par conséquent, aussitét 
qu'une échelle contient plus d'un test par niveau d’age, les pourcentages 
de réussite 4 observer sont toujours supérieurs 4 50 et l'importance de 
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cette augmentation dépend directement du nombre de tests placés a 
chaque niveau. C'est 1a ce qui explique, 4 notre avis, le fait demeuré 
inexpliqué par McNemar (14) que, si l'on veut obtenir un Age mental 
moyen égal a l’4ge chronologique moyen, dans une échelle du type 
Terman-Merrill, les pourcentages de réussite, 4 lage ot se situent les 
item, ne s'approchent jamais tellement de 50, mais varient plutét entre 
60 et 75. 

La méthode suggérée ici peut s‘illustrer par un exemple. Supposons 
que Ton veuille déterminer le pourcentage idéal de réussite 4 exiger 
pour assigner a l’'dge de 2-6 ans six tests satisfaisant aux principes de 
transitivité et d’asymétrie. Dans ces conditions, 50 pour cent des sujets de 
2-6 ans d’dge chronologique, c’est-d-dire tous ceux dont le quotient 
intellectuel s‘éléve 4 100 ou plus, devront réussir tous les six tests. Un 
certain nombre de sujets, également agés de 2-6 ans, n’en réussiront 
respectivement que cing, quatre, trois, et ainsi de suite. Certains sujets 
nen réussiront aucun. Or il est possible de déterminer ces diverses propor- 
tions de résultats partiels en s'appuyant sur les données du tableau II. On 
sait en effet que la réussite de cinq tests donne au sujet un 4ge mental de 
2-5 ans et par conséquent, pour un sujet de 2-6 ans, un quotient in- 
tellectuel de 97. Dans une population de 100, le nombre de sujets dont le 
quotient intellectuel est de 97 4 100 exclusivement est de 8.02 (i.e., 50.00- 
41.98): c'est donc dire que 8 pour cent des sujets de 2-6 ans ne réussiront 
que cing des tests de cet Age. Semblablement, la réussite de quatre tests 
sur six donne au sujet un quotient de 93 et, comme le pourcentage normal 
de sujets obtenant un quotient de 93 4 97 exclusivement s‘éléve 4 10.14 
(i.e., 41.98-31.84), il s’ensuit que 10 pour cent des sujets de 2-6 ans ne 
réussissent que quatre des six tests de cet age. I] est simple de calculer 
de la méme maniére le pourcentage de sujets de 2-6 ans qui réussiront 
respectivement trois, deux, un ou zéro des six tests de cet Age. Si lon 
cumule maintenant les différentes réussites obtenues par 100 sujets de 
2-6 ans sur l’ensemble des six tests de cet Age, on arrive aux données 
suivantes: 


Nombre de sujets Nombre de tests réussis Réussites combinées 
sur 100 sur 6 
50 6 800 
8 5 40 
10 4 40 
7 3 21 
6 2 12 
6 1 6 
13 0 0 


100 419 
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TABLEAU III 


DISTRIBUTION DES POURCENTAGES MOYENS DE REUSSITE A OBSERVER CHEZ LES ENFANTS 
Acks DE 2 A 12 ANS SUR UN ENSEMBLE DE SIX TESTS ASSIGNES A CHACUN DE CES AGES 





Age Age auquel le test est assigné 
chrono- ——————_________—_——_ 
logique 2 26 3 36 4 46 5 6 7 8 oe oe 
2 73 «#19 1 oe - - - =| S| Fe rel er rh 
26 96 70 23 2 eo - - - - - rr erhlhc 
3 100 93 68 26 5 0 .o- - - - er e hm 
3-6 100 99 91 65 30 7 1 1 - - - -—- —- = 
4 — 100 98 88 63 32 10 1 xo - - -—- —- =— 
46 — — 100 96 85 62 33 7 xo - - —- —- =— 
5 — — 10 99 95 8 61 23 2 oo - —- — 
6 — — — 10 9 97 91 68 2 5 0 o—- — 
7 —- — — — 100 100 98 91 65 30 7 1 o — 
8 - —- —- —- — — 10 98 88 63 32 10 2 0 
9 —- —- —- —- —- — — 10 9 8 62 33 12 3 
10 —- —- —- —- — — — 10 99 9 8 61 35 15 
11 —_-_ —- —- —- —- — — — 10 38 8 81 61 36 
12 —-_ - - - —- —- — — 10 9 9 91 T 60 


En somme, les six tests doivent donner lieu 4 419 réussites chez les 
enfants de 2-6 ans, ce qui fait, pour chaque test, une réussite moyenne 
de 69.8 ou, si on veut, de 70. Le tableau III (voir les chiffres en carac- 
téres gras) indique le résultat de calculs similaires pour chacun des 
niveaux d’dge compris entre 2 et 12 ans. Pour arriver 4 ces pourcentages, 
on a supposé que l’échelle a construire devait contenir six tests 4 tous les 
niveaux d’Age échelonnés par 6 mois avant 5 ans et par 12 mois a partir 
de 5 ans. 

Il est aisé de constater que les pourcentages calculés diminuent 
graduellement 4 mesure que l’Age augmente. A 6 ans, cependant, on 
observe une élévation brusque du pourcentage de réussite par rapport a 
lage précédent. La chose s’explique facilement si on songe que, a partir 
de cet Age, les six tests de chaque niveau s’échelonnent sur douze mois 
au lieu de six. En manquant tous les tests de son age, un sujet de 6 ans ob- 
tient donc un 4ge mental de 5 ans (Q.I.=83). et un sujet de 5 ans un Age 
mental de 4-6 ans (Q.I.=90). En d’autres termes, si l'on choisit de 
répartir les niveaux d’Age par douze mois a partir de 5 ans, on se trouve 
4 assumer ipso facto que les tests de 6 ans sont relativement plus faciles 
que ceux de 5 ans et que, par conséquent, un échec est plus grave a 6 
ans ou, ce qui revient au méme, le pourcentage moyen de réussite est plus 
grand a 6 ans. Il est bien évident que, si lon avait continué de répartir les 
niveaux d’Age par 6 mois en retenant toujours six tests par niveau, la 
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distribution des pourcentages moyens de réussite garderait sa régularité: 
elle s‘échelonnerait en fait de 73 a 2 ans jusqu’a 55 a 12 ans. 

La méme technique peut servir ensuite 4 calculer, pour les tests situés 
a’ chacun des niveaux d’age, le nombre de réussites que l'on devrait 
trouver aux Ages suivant ou précédant celui ot doit se situer chaque 
probleme. Supposons, par exemple, qu'un enfant de 2 ans réussisse, en 
plus des six tests de son Age, un test de 2-6 ans: il obtient alors un age 
mental de 2-1 ans (Q.I.=104). Si l'on se reporte au tableau II, on verra 
que 10 pour cent des sujets, d'une population normale peuvent avoir un 
quotient intellectuel de 104 4 108 exclusivement. On peut déterminer de 
la méme maniére la proportion des sujets de 2 ans qui peuvent réussir 
deux, trois, quatre, cing ou six des six tests de 2-6 ans. En combinant 
alors toutes les réussites des enfants de 2 ans sur l’ensemble des tests de 
2-6 ans, on trouve un pourcentage moyen de 19. Le méme procédé 
permet d’établir 4 93 le pourcentage moyen de réussite des enfants de 3 
ans sur les tests de 2-6 ans. C’est donc dire que, pour assigner valable- 
ment un probléme a l'age de 2-6 ans, les pourcentages moyens de réussite 
doivent atteindre 19 a 2 ans, 70 A 2-6 ans et 93 a 3 ans. 

Le tableau III reproduit tous les résultats qu’on obtient en faisant des 
calculs similaires pour chacun des niveaux d’age. Ces pourcentages in- 
diquent le nombre moyen de réussites des enfants de 2 4 12 ans sur 
l'ensemble des six problémes qu'on désire assigner 4 chacun de ces Ages. 
Ce sont ces pourcentages qui doivent servir au choix et a la localisation 
des tests 4 inclure dans une échelle d’4ge. L’application de ces critéres 
devrait, semble-t-il, conduire directement, c’est-a-dire sans tatonnement, a 
la construction d'une échelle répondant aux conditions de transitivité et 
d’asymétrie et fournissant a tous les Ages des moyennes et des écarts 
semblables. 

Au terme de cette laborieuse étude, trois remarques sont a propos. La 
premiére concerne les pourcentages reproduits au tableau III. Il importe 
de noter que l’établissement de ces pourcentages dépend directement des 
hypothéses faites au début sur la distribution des quotients intellectuels 
dans la population générale. Rappelons que, dans le cas présent, on a 
assumé la normalité de cette distribution et choisi une erreur probable de 
10 points autour d’une moyenne de 100. Si I’on avait fixé un écart probable 
plus étendu, comme 12.5 par exemple, les chiffres obtenus seraient tous 
légérement différents. C'est ainsi que les pourcentages moyens de réussite, 
au niveau d’4ge auquel chaque test est assigné, s’échelonneraient entre 
70 a 2 ans et 57 a 12 ans. Le choix de l’écart probable reste donc purement 
conventionnel et dépend seulement de la dispersion qu’on désire obtenir. 
Il n’affecte en rien la valeur de la technique proposée. 

En second lieu, i] convient d’observer que les pourcentages calculés ne 
représentent que des moyennes. Au moment de choisir et de localiser les 
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tests, il n’est donc pas nécessaire de s'en tenir rigoureusement aux chiffres 
suggérés et de rejeter tous les tests qui, 4 un Age donné, ont un pour- 
centage de réussite supérieur ou inférieur au critére idéal. On doit 
chercher plutét 4 rassembler des problémes qui s’en rapprochent le plus 
possible. En somme, plus on se conformera 4 ces critéres, mieux on 
assurera la valeur de l’échelle comme instrument de mesure. I] restera 
4 démontrer ensuite, par une étude indépendante, la validité de l’instru- 
ment. 

La derniére observation est d’ordre plus théorique et demanderait un 
assez long développement. Signalons simplement que le fait d’exiger en 
principe le méme pourcentage moyen de réussite 4 chacun des sous-tests 
assignés 4 un Age risque de soulever le probléme délicat de ’homogénéité 
de l’échelle ainsi construite. Toutes choses égales d’ailleurs, un enfant de 
sept ans, par exemple, qui réussit quatre seulement des six tests de sept 
ans obtient un quotient moindre que l’enfant de son Age qui les réussit 
tous. Etant donné que tous ces tests ont théoriquement la méme difficulté, 
il faut bien assumer qu’ils mesurent chacun une aptitude différente et 
que, dans une échelle de cette nature, c’est en définitive une somme 
Whabiletés variées qui définit le niveau mental. I] n’est done pas im- 
possible que, pour un méme sujet, deux problémes de difficulté semblable 
pour le groupe n’offrent pas la méme difficulté. On rejoint ici de nouveau 
les conditions de transitivité et d’asymétrie, mais cette fois au plan des 
performances individuelles. A notre sens, il importe quand méme de 
maintenir égaux, pour un Age donné, les pourcentages de réussite de 
chacun des sous-tests assignés 4 cet Age. Alors que, pour obtenir une 
certaine constance dans les quotients intellectuels moyens de chaque 
niveau d’Age, une échelle comme celle de Terman se voit presque acculée 
4 compter sur l'absence relative de ces conditions de transitivité et d’asy- 
métrie méme au plan des performances collectives, il semble plus justi- 
fiable de viser au méme but en exigeant au départ que ces conditions se 
réalisent au moins pour le groupe. La discussion de ce probléme deman- 
derait une étude détaillée des hypothéses qui sont 4 la base des échelles 
de développement mental et surtout un exposé critique de la différence 
qui existe entre une échelle d’Age et une échelle en points. Qu’il suffise de 
mentionner que le concept d’homogénéité n’a probablement pas une signi- 
fication univoque dans ces deux genres d’échelles. Le probléme soulevé 
par la combinaison de tests différents 4 chacun des niveaux d’une échelle 
d’age n’est d’ailleurs pas tellement différent de celui que peut poser la 
combinaison des différents sous-tests d'une échelle en points. I] s’agit la 
d'une question difficile sur laquelle i] faudrait revenir. Quoi qu'il en soit, 
il semble bien que la solution de ce probléme n’engage pas essentielle- 
ment la valeur de la technique suggérée dans le présent travail. 
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SUMMARY 


The placement of tests in age scales of mental ability (e.g., Stanford- 
Binet, Terman-Merrill) has been mainly the product of trial and error, 
test items being selected, grouped, and re-grouped until mean IQ’s are 
stabilized around 100 at all age levels. Owing to this lack of precision 
there is considerable variation in the means and dispersions of IQs 
secured at different ages, and the same mental age can be obtained by 
widely differing performances. Such inconsistencies are chiefly attribut- 
able to two facts: the difficulty of items assigned to each age level has 
been too arbitrarily determined, and the discriminating value of the 
problems has been neglected. 

The authors present a simple technique which determines in advance: 
(a) the ideal item difficulty for problems placed at a given age level, and 
(b) the percentages of success required at preceding and following age 
levels if the items are to have maximum discriminating power. Scales 
constructed according to this method will come closer to satisfying the 
conditions of transitivity and asymmetry required of any measuring 
instrument. 
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THE EFFECTS OF TERMINATION OF THE CS AND AVOIDANCE 
OF THE US ON AVOIDANCE LEARNING: AN EXTENSION! 


LEON J. KAMIN 
Queen’s University 


THE srupy reported here extends an earlier one (6) which examined 
separately the effects of termination of the CS and avoidance of the US 
on avoidance learning. The assessment of the relative importance of 
these two factors in avoidance training is of some theoretical moment, 
since S-R and cognitive theories of learning differ as to which of the 
factors is regarded as the reinforcement for learning. The cognitive or 
“expectancy” theories (3, 10, 11) regard avoidance of the US as the re- 
inforcement, while S-R theories (8, 9, 12) attribute reinforcement to 
response termination of the CS, which has been paired with a noxious US. 
The bulk of experiments on avoidance confound these two effects. 

The earlier study, employing rats in a shuttlebox as subjects, was 
factorially designed to cross-cut termination and non-termination of 
the CS with avoidance and non-avoidance of the US. The analysis of 
variance of the data indicated that both termination of the CS and avoid- 
ance of the US markedly facilitated frequency and latency of CR’s, and 
that interaction was negligible. The present study repeats the earlier 
design and procedure with a single change: the CS-US interval, formerly 
5 seconds, has been lengthened to 10 seconds. This extension was dic- 
tated by theoretical considerations which suggested (6) that the per- 
formance of some experimental groups would be strongly affected 
thereby. 


METHOD 

Subjects and Apparatus 

The subjects were 32 experimentally naive hooded rats, ranging in age from 
about 3 to 4 mo. The rats, maintained on an ad lib feeding schedule, were 
randomly assigned to experimental groups. The apparatus was a modified Mowrer- 
Miller shuttlebox. The dimensions of the box were: length 36 in., width 5 in., and 
height 4% in. There was no barrier between the two halves of the box, but a 
metal molding projected % in. into the box from the centre of each side wall to 
demarcate the two halves. 


1This research was supported by a grant from the Associate Committee on Applied 
Psychology of the National Research Council of Canada. Mr. Charles Hockman 
gave experimental assistance. 
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The US (electric shock) was administered through a grid floor. The grid of 
each half of the box could be separately charged. The intensity of the US was 
varied slightly from S to S to maintain a constant behavioural effect. The current 
flow to the average S was about 1.2 ma. This regularly elicited high-pitched 
squealing and vigorous running movements, and often defecation and urination. 
The CS was an ordinary buzzer mounted at the rear of the apparatus. 

Design 

There were 4 experimental groups with 8 Ss in each group, constituting a simple 
2X 2 factorial design. The 4 groups were: the “Normal” Ss, who by responding 
could both terminate the CS and avoid the US; the “Terminate-CS” Ss, who could 


only terminate the CS; the “Avoid-US” Ss, who could only avoid the US; and the 
“Classical” Ss, who could neither terminate the CS nor avoid the US. 


Procedure 


The S was placed in the apparatus and left unmolested for 5 min. The CS was 
then presented for the first time, followed 10 sec. after its onset by the US. The 
CS and US continued to act until S (usually very promptly) ran to the other 
half of the apparatus, at which time both were turned off. This procedure was 
identical for all groups and was repeated throughout training whenever an S had 
failed to run to the opposite side of the box within 10 sec. after onset of the CS. 
Trials of this nature are classed as escapes. 

There were, however, many trials during which S responded by running after 
the onset of the CS and before the scheduled onset of the US. Trials of this nature 
are classed as “CR’s.” When a CR occurred, the experimental procedure varied 
from group to group. With a Normal S, a CR was followed immediately by 
termination of the CS, and no US was delivered for that trial. With a Terminate- 
CS S, a CR was followed immediately by termination of the CS; but 10 sec. after 
onset of the CS the US was delivered to whatever side of the box S occupied. This 
shock, like all others, was terminated by S’s running to the opposite side of the box. 
With an Avoid-US S, a CR caused the US to be omitted for that trial, but had no 
effect on the CS, which continued to act for its usual 10 sec. total. With a Clas- 
sical S, the CR had no effect on the CS, and the US was invariably applied to what- 
ever side of the box S occupied. The US again was terminated by running. 

There were, for all Ss, 100 trials conducted in a single experimental session. 
The interval between trials was set at a fixed, irregular schedule. The inter-trial 
intervals were 50, 60, and 70 sec., with a mean of 60 sec. The CS-US interval for 
all groups was 10 sec. 


Measures 


The frequency of both CR’s and escapes was tabulated, and latencies were 
recorded by stopwatch to the nearest .10 sec. CR’s were recorded whenever S ran 
to the opposite side of the box within 10 sec. after onset of the CS. The frequency of 
“spontaneous” inter-trial runs across the box was also tabulated. 


RESULTS 


There were no significant differences among groups for the trial on 
which the first CR occurred, for the latency of the first CR, or for the 
number of spontaneous runs made during the initial 5-min. observation 
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period. This indicates satisfactory random assignment of Ss to groups. 

Table I presents, by experimental groups, data for the number of 
CR’s recorded, for the median latency of CR’s, and for the number of 
spontaneous responses. The latency data were determined by calculating 
for each S the median latency of its own CR’s. 

The acquisition of the CR by all groups is plotted in Figure 1 by blocks 
of 10 trials. The Normals, clearly surpassing all other groups in perform- 
ance, rapidly acquire a stable CR, achieving an asymptote in about 50 
trials. The curves for both the Terminate-CS and Classical rats decline 
slightly after an initial rise. The Terminate-CS group shows only a very 
slight superiority to the Classical group. The curve for the Avoid-US 
Ss shows a continued rise throughout training. These Ss are inferior to 
the Terminate-CS and Classical rats during the earlier trials, but surpass 
both these groups in the later trials. 


TABLE I 
RESPONSE MEASURES FOR Four EXPERIMENTAL GROUPS 


Experimental group 
Measure Normal Terminate-CS Avoid-US Classical 


Number of CR’s 


Mean 82.3 36.4 36.6 31.6 

Median 86.5 37.5 42.5 32.5 

Range 62-89 20-50 9-56 16-44 
Median CR latency 

Mean 2.28” 4.56” 4.86” 5.47” 

Median 2.25” 4.70” 4.50” 5.18” 

Range 1.4-3.2” 2.4-7.2” 2.3-7.8” 4.0-7.7” 
Number of spontaneous responses 

Mean 18.4 18.5 22.4 14.9 

Median 12.0 16.0 11.5 15.5 

Range 3-56 3-37 0-74 1-33 


Inspection of Figure 1 suggests the possibility of interaction effects 
associated with blocks of trials, which were not observed in the earlier 
experiment. Therefore, the analysis of variance of the present data was 
converted to a Lindquist Type III design (7).. This was done by in- 
cluding as an effect two blocks of trials (Trials 1-50 and 51-100) for each 
S. The summary of this analysis of variance is given in Table II. 

Table III summarizes an analysis of variance of the data for median 
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CR latency. These data were not analysed by blocks of trials, since the 
small number of CR’s made by some Ss in some blocks would have made 
individual block medians highly unreliable. 

The extreme variability and irregular distributions of the data for 


Source 


Termination of CS 
Avoidance of US 
Termination of CS X 

avoidance of US 
Error (b) 


Trials 

Trials X termination of CS 

Trials X avoidance of US 

Trials X termination of CS 
X avoidance of US 

Error (w) 


TABLE II 
SUMMARY OF ANALYSIS OF VARIANCE OF CR FREQUENCY 


df 
1 


Be 


— 


MS 


2537 . 64 
2588 . 27 


1670.77 
69.41 


708.89 
ioe 
1233.77 


.O1 
25.40 





F p 
36.56 <.001 
37.29 <.001 
24.07 <.001 
27.91 <.001 
48.57 <.001 
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TABLE III 
SUMMARY OF ANALYSIS OF VARIANCE OF MEDIAN CR LATENCIES 
Source df MS F p 
Termination of CS 1 2432.5 11.68 < .005 
Avoidance of US 1 1667.5 8.01 <.01 
Termination of CS X 
avoidance of US 1 541.6 2.60 <.20 


Error 28 208.3 


spontaneous responding precluded an analysis of variance. The Mann- 
Whitney U test, however, indicated no significant difference between 
any pair of experimental groups. 

To facilitate comparison of the data from the present and earlier 
experiments, the data for total number of CR’s were pooled for a single 
analysis of variance. This analysis, of course, included CS-US interval 
(experiments) as an effect. The pooled analysis did not include blocks 
of trials as an effect. The summary of this analysis is given in Table IV. 


TABLE IV 


SUMMARY OF ANALYSIS OF VARIANCE OF CR FREQUENCY, 
PooLinG DATA FROM Two EXPERIMENTS 








Source df MS F p 

CS-US Interval 1 8.27 — 
Termination of CS 1 10480. 64 59.14 <.001 
Avoidance of US 1 12404 .39 69.99 <.001 
CS-US Interval X 

termination of CS 1 2.64 — 
CS-US Interval X 

avoidance of US 1 92.64 —- 
Termination of CS X 

avoidance of US 1 2512.52 14.18 <.001 
CS-US Interval X termination 

of CS X avoidance of US 1 1000.90 5.65 <.05 
Error 56 177.22 

Discussion 


The effects in the present experiment (Table II) of both termination 
of the CS and avoidance of the US are highly significant, with both factors 
contributing to CR frequency. There is, however, a significant interaction 
between the two factors. The superiority of the Normal rats over the 
Avoid-US rats is very much greater than is the difference between the 
Terminate-CS and Classical groups. 
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Turning to the within-subjects effects in Table II, the significant effect 
of trials reflects learning. There is only one significant within-subjects 
interaction—that between trials and avoidance of the US. The meaning 
of this interaction is made clear by inspection of Figure 1. There is, 
in the early trials, no significant main effect of avoidance of the US, but 
this effect is overwhelming in the later trials. Parenthetically, the per- 
formance of the Avoid-US group during the early trials is inferior even 
to that of the Classical Ss (p=.05). 

These data are amenable to an S-R analysis, but the analysis is not 
simple. We first assume that response termination of the secondarily 
noxious, fear-eliciting CS reinforces instrumental responses. This re- 
inforcement of CR’s must occur for both the Normal and Terminate-CS 
groups. The same reinforcement, however, should occur in an attenu- 
ated form for the Avoid-US Ss. The CS, for these Ss, does terminate some 
few seconds following a CR. Thus, we postulate a delayed secondary 
reward of CR’s for the Avoid-US group. This delayed reinforcement 
should be more gradually cumulative over trials than is immediate re- 
inforcement. 

The role of shock in reinforcement is similarly complicated (2). We 
assume that the first function of the US is to reinforce classically con- 
ditioned fear of the CS. Thus, withholding the US on a given trial 
must weaken the CS, and we must assume the Normal and Avoid-US 
groups to be less fearful of the CS than are the groups with inevitable 
shock?. The US, however, must also serve to punish (suppress) instru- 
mental responses which precede it in time. Thus, withholding the US 
must also have the effect of allowing reinforced instrumental responses 
to the CS to develop unimpeded by punishment. 

These theoretical considerations endow avoidance training with a 
frightening complexity. They are, however, at least consonant with, if 
not necessary deductions from, most extant S-R theories of learning. 
We can now ask how these various factors might interact to affect 
the performances of our experimental groups. 

Theoretically, Normal Ss are less afraid of the CS, but their CR’s are 
immediately rewarded by termination of the CS, and are never punished. 
Thus, a relatively smooth and rapid acquisition of the CR might ensue. 
The Avoid-US rats, in early trials, undergo a decrement in fear and as 
yet derive little benefit from delayed CS-termination; in later trials the 
cumulative effect of delayed CS-termination should elevate their 


2The argument assumes that a small number of CS-US pairings can establish a 
fear of the CS sufficiently resistant to extinction to motivate a large number of re- 
inforced instrumental responses to the CS, without renewed pairing of CS and 
US. There exists considerable evidence on this point (4, 14, for example). 
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performance. The Terminate-CS group should be maximally afraid of the 
CS and should have each CR promptly reinforced by CS termination, 
but should suffer a cumulative effect of punishment. The initial rise and 
later decline of their performance curve might indicate such a sequence 
of events. The Classical Ss, similar to the Terminate-CS rats but without 
prompt CS-termination, might exhibit a parallel curve at a lower level.* 

This involved argument provides a largely ex post facto rationale for 
the shape of the curves of Figure 1, and thus for the data of Table II. 
Without independent experimental data on the acquisition of fear, and 
on the effects of delayed secondary reward (9) and delayed punishment 
(18), it can scarcely be convincing. The function of the argument is 
merely to indicate that S-R theories provide a framework of explanatory 
principles within which to attack the empirical facts. 

The most embarrassing fact for an S-R theory is, of course, the 
facilitating effect of avoidance of the US. We have tried to explain 
this away by equating avoidance of the US with withholding of punish- 
ment for a CR reinforced by termination of the CS. While the earlier 
study found avoidance of the US to be a significant factor throughout 
training, the present experiment does reduce the significance of this 
effect to the later trials of training. This change presumably occurred 
because the lengthening of CS-US interval increased the delay of reward 
assumed to be operative for the Avoid-US groups. 

Whether current cognitive learning theories can incorporate the data 
seems, to the writer, unclear. The embarrassing effect of CS-termina- 
tion might be attributed to some “emphasis” phenomenon, though the 
behaviour of the Avoid-US rats seems inexplicably “stupid.” What, 
however, is there in cognitive theory to account for the fact that 
avoidance of the US does not contribute to CR frequency early in 
training? Presuming that the rat makes his early responses because he 
expects to be shocked if he does not, why are these responses not 
immediately strengthened? The early contention that expectancy theory 
is “directly applicable” to avoidance learning (3, p. 99) has not weathered 
well. 

The latency data for the present study (Table III) confirm in general 
the data on CR frequency. The effects of CS-termination and US- 
avoidance are significantly facilitating, but in this case their interaction 
is not significant. 

The pooled data on CR frequency for the two studies (Table IV) 
supplements the foregoing analyses. CS-US interval has no effect on 

8The difference in performance between the Normal and Terminate-CS groups 


forces the conclusion that the cumulative effect of punishment outweighs the cumu- 
lative effects of CS-termination. 
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CR frequency. This finding is consonant with the data of Brush, Brush, 
and Solomon (1), who show CS-US interval to be of relatively little 
significance in delayed as opposed to trace (5) avoidance training. The 
main effects of CS-termination and of US-avoidance are clear, and the 
interaction between them is significant, as is the interaction between 
CS, US, and CS-US interval. 

The differences in performance between similar groups in the two 
experiments require some examination. We had expected (6) that the 
increase of CS-US interval would depress the performance of the Avoid- 
US group while facilitating the performance of the Terminate-CS rats. 
This should have occurred since an increase of CS-US interval, having 
a relatively small effect on CR latency, tends to increase the delay of 
reward and the delay of punishment assumed, respectively, to be opera- 
tive for these two groups. The performance of the Avoid-US rats, during 
the first 50 trials, was significantly poorer in the present study than in the 
first one (p=.05). The Terminate-CS rats of the present experiment, how- 
ever, do not differ significantly from their counterparts in the earlier study. 
Post hoc we might assume the gradient for the delay of punishment to be 
less steep than that for the delay of secondary reward, but the need for 


independent and precisely controlled studies of these hypothesized 
gradients is obvious. 


SUMMARY 


The effects of response termination of the CS and of avoidance of the 
US on avoidance learning were studied with a 2 < 2 factorial design em- 
ploying 32 hooded rats as Ss in a shuttlebox. There were significant 
(facilitating) main effects of both factors, and a significant interaction 
between them. The analysis of the data by blocks of trials indicated 
that the effect of avoidance of the US was significant only during the 
later trials of training. The data of the present and of a related earlier 
study were tentatively interpreted in terms of the interaction of three 
factors: fear of the CS; reward of instrumental responses by termination 


of the CS; and punishment of instrumental responses by delivery of the 
US. 
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THEORETICAL PREDICTIONS AND RESPONSE MEASURES 


TRENE MACKINTOSH 
University of Oklahoma! 


DISCRIMINATION reversals and related phenomena have recently been the 
subject of much experimental investigation and theoretical discussion 
(6, 8). In general, the experimental data indicate that when a series of 
similar problems is learned, there is a gradual and orderly increase in 
learning efficiency (3, 5). As successive reversals are presented, there is 
at first negative transfer, which gradually decreases until eventually fewer 
trials are required to learn each new reversal than were required for the 
original learning (2, 6). 

The continuous development of inter-problem transfer can be readily 
dealt with by the concepts employed by Hull (4) and Spence (7), but 
increasing learning efficiency during habit reversals cannot be predicted 
on the basis of these theories. North (6) has discussed this problem in 
detail and has indicated the inadequacy of various current theories. The 
particular inadequacy of a theory can be specified more clearly if the 
theory is used to make exact predictions which can then be tested 
experimentally. 

In the successive reversal problems, the conditions of reinforcement 
are alternated, so that one response is rewarded for one block of trials, 
the other response is rewarded for a second block of trials, and so on 
through a series of alternations. Little or no experimental investigation 
has been devoted to “irregular” reversals, that is, reversals in which a 
particular response is rewarded for a variable number of trials, after 
which the other response is rewarded, again for a variable number of 
trials. 

Hull (4, p. 21 f.) has traced quantitively the characteristic events 
during distributed-trials simple trial-and-error learning with various com- 
binations of ,E,, and ,E,. Some data have been presented which sup- 
port the theoretical predictions. In Hull’s analysis the correct or appro- 
priate response is always reinforced and the incorrect or inappropriate 
response is never reinforced. He has not attempted to derive the theor- 
ethical response probabilities where different responses are reinforced 
on different trials. In the present investigation Hull’s theorems were used 
to calculate the theoretical probability for response occurrence under 


1The data presented in this paper were collected at the University of Nebraska. 
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such conditions, and the validity of the theoretical predictions was then 
tested experimentally. 

The experimental data were the number of responses made on each 
of two manipulanda when, without differential cues signifying appro- 
priateness, a response on one manipulandum was reinforced on some 
trials, while a response on the other manipulandum was reinforced on 
other trials. For clarity of exposition, the experimental data are presented 
first. 


EXPERIMENTAL DaTA 

Experimental Procedure 

The data of this study come from the acquisition phase of an investigation on 
experimental extinction. The results of present interest are for eight rats from the 
University of Nebraska colony, four males and four females, all approximately 
120 days old at the beginning of the experiment. Briefly, the apparatus consisted 
of three compartments: a starting box, a response box, and a goal box. In the 
response box were three manipulanda, a chain, a horizontal bar, and a vertical bar. 
These subjects received no training on the vertical bar, and, after the first few 
trials, they seldom approached it. On any given trial, operation of one, and only 
one, of the manipulanda would open the door to the goal box, where a primary 
reward (food or water) was placed. The manipulandum to be associated with 
primary reinforcement was randomly predetermined and, during 90 acquisition trials, 
each of the two manipulanda used was appropriate for 45 trials. It was possible 
for the rat to make a large number of responses on the inappropriate manipulandum 
before making the appropriate response. However, when one appropriate response 
was made, reinforcement was available and the trial was completed. The interval 
between trials was approximately 10 minutes, but there was a 24-hour interval after 
trials 4, 12, 22, 38, 56, and 74, since acquisition training lasted for 7 days. 


RESULTs 


The mean numbers of inappropriate responses per trial for trials 26 to 
56 are presented graphically in Figure 1. Appropriate responses are not 
included, but one appropriate response was, of course, made just prior to 
the terminal reinforcement on each trial. To save space only a portion of 
the 90 trials are shown, those in the middle of the series being chosen as 
most relevant to the later discussion. The cross-hatched bars represent 
inappropriate responses on the bar, that is, responses made on the bar 
when chain-pulling was the appropriate response. The plain bars repre- 
sent inappropriate responses on the chain, in trials when bar-pressing was 
the appropriate response. The familiar learning effects will be noted: two 
or more successive reinforcements of a given response are accompanied 
by a progressive decrease in inappropriate responses, followed by a sharp 
rise in the number of inappropriate responses when reinforcement is 
shifted to the alternate response. 
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Ficure 1. Mean numbers of inappropriate responses per trial for 31 of 90 acquisition 
trials. 


Another trend, more noticeable when all 90 trials are considered, is a 
gradual decline in the number of inappropriate responses. This decline 
is apparent in Figure 2, which shows the mean number of inappropriate 
responses made in each block of 10 trials. For the last 10 trials the animals 
averaged less than one inappropriate response per trial, that is, they 
seldom repeated an unrewarded response, but shifted to the other 
manipulandum. Had maximum learning been achieved (one inappropriate 
response per reversal) the mean number of errors would have been ap- 
proximately .5, since each manipulandum was appropriate in half the 
trials. Figure 2 also shows the mean theoretical reaction potential (,E,) 
for the inappropriate response, which is approximately constant through 
the nine 10-trial blocks. 

The calculation of theoretical ,E, for each trial is described below. 
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Ficure 2. Mean numbers of inappropriate responses made by 8 rats on blocks of 10 
trials, with mean theoretical ,E,. for the inappropriate response per block of 10 trials. 


THEORETICAL PROBABILITY CALCULATIONS” 


For the specific sequence of bar and chain reinforcements used here, the theor- 
etical probability of response occurrence was calculated for both the appropriate 
and inappropriate responses, and for each of the 90 trials. The assumption was 
made that at the beginning of acquisition training there was a very low reaction 
putential for each of bar-pressing and chain-pulling, and that the two reaction 
potentials were equal. This initial reaction potential was arbitrarily taken to be 
lg unit. Hull (4, p. 21) was followed in letting each increment in reaction potential 
be given by 

AeEry = (M—.Er4) —(M— sEr4)10™ 
where M=6 and i= —.091. 
Hull tentatively assumed, admittedly without adequate evidence, that conditioned 
inhibition (,J,) follows the same law, with the same constants as the reaction 
potential. Thus, in his calculations, 


Aely- =,J,_ = (et r~) 10~-°!, 


In the present study several non-reinforced responses could be made on a given 
trial, and consequently more ,I,_ was probably generated. Therefore, in calcula- 
ting the ,I,_ for each trial, i was assumed to equal .30 and 


er 
Vy =,J,_ - (eJ>~-)10~-*, 


On the basis of calculated probability, inappropriate responses would be made 
by only some of the subjects on a given trial. To take this into account, the in- 


2A detailed description of the calculations is available from the author. 
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hibitory increment for each trial was multiplied by the probability for the occur- 
rence of the response. In agreement with Hull (4, p. 24), it was assumed that the 
reaction potential from the appropriate response generalized to the extent of 20 
per cent to the inappropriate response. However, because of the possibility of a 
large number of inappropriate responses, it was assumed that the inhibitory potential 
generalized 30 per cent (rather than 20 per cent as Hull assumed) from the 


inappropriate response to the appropriate response. 

The operations described above were used to calculate the theoretical reaction 
potential for each of the two responses on each trial. On the basis of the theoretical 
reaction potentials, the probability for the occurrence of each response was deter- 
mined according to procedures used by Hull (4, p. 26). 


COMPARISON OF THEORETICAL PREDICTIONS WITH EXPERIMENTAL DATA 


For comparison of the theoretical predictions with the experimental 
data, the proportion of the total responses which was inappropriate was 
calculated for each trial. Proportion units are more directly comparable 
to the theoretical probability units than are the absolute numbers of 
inappropriate responses shown in Figure 1. The proportions were com- 
puted by dividing the mean number of inappropriate responses for a 
given trial by the mean total number of responses made on that trial. 

Figure 3 represents the proportion of inappropriate responses for each 
trial from trial 26 to trial 56, together with the theoretical probability 
for their occurrence. For these 31 trials the number of inappropriate 
responses follows, in general, the changes predicted by theory. 

To assess the degree of agreement between the experimental data and 
the theoretical predictions throughout the series of 90 trials, a Pearson r 
was computed. The resulting r is .30 (o,=.10), indicating a significant 
degree of agreement (p= .01). Correlations computed separately for 
the first, middle, and last third of the trials were as follows: 


Trials 1-30 r= .22 (ns.) 
Trials 31-60 r= .53 (p<.01) 
Trials 61-90 r= .10 (ns.) 


DIscussION 


While there is an over-all correspondence between the experimental 
data and theoretical predictions, this is clearly due to the high correla- 
tion in the middle of the series. Lack of correspondence in the initial and 
final sections may be due to the same cause, namely, a steady and gradual 
improvement in the rats’ ability to change the response promptly. The 
decline in the number of inappropriate responses, indicated in Figure 2, 
is an example of increasingly successful reversal performance. North (6) 
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Ficure 3. Proportion of inappropriate responses per trial for trials 26 to 56, and 


corresponding theoretical probability for the occurrence of inappropriate responses. 
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has pointed out that Hull's theory, in its present form, will not predict 
successful reversals. The constant level of theoretical reaction potential 
presented in Figure 2 supports North’s contention. North has suggested 
two further assumptions which might permit such predictions, but these 
still require testing. 

As far as the middle trials are concerned, the ability to predict these 
data from Hull’s postulates gives some support to his theory, particularly 
postulates VIII and IX, and theorems 1A and 1F. However, the significant 
correlation must be interpreted cautiously since, as is indicated in 
Figure 2, the theoretical and empirical curves cross in the region where 
r is high. The significant r may be an artifact, based on the fact that one 
value remains relatively constant while the other value is declining. 

Even for the middle third, the correspondence is far from perfect. This 
is to be expected, given the small number of animals, and the fact that the 
constants are not known. There is also the problem of oscillation. Even 
when the constants proposed by Hull are used, on certain trials the 
theoretical reaction potential associated with the appropriate response, 
when it presumably occurs, is considerably lower than the inappropriate 
response. Because of indeterminacy produced by oscillation, there cannot 
be exact correspondence between theoretical predictions and empirical 
data. 

The findings presented in this paper make it clear that Hull’s formulae 
cannot predict responses satisfactorily in a situation involving reversal 
learning. In order to predict reversal performance, the theory must be 
modified to take into account improvement in performance over a series 
of trials. 


SUMMARY 


Eight rats were given 90 instrumental reward trials. On 45 randomly 
distributed trials one of two responses was appropriate; on the remaining 
trials the other response was appropriate. The proportion of responses 
which were inappropriate was calculated for each trial. 

On the basis of Hull’s theory, the theoretical ,E, for each type of 
response was calculated for each trial. The theoretical probability for the 
occurrence of the appropriate and inappropriate response on each trial 
was determined on the basis of the relative strengths of the theoretical 
reaction potentials. 

The theoretical probability for the occurrence of the inappropriate 
response was compared with the proportion of the responses which were 
inappropriate. For the middle third of the trials, there was a significant 
relationship between the theoretical and empirical data. The relation- 
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ship did not hold for the initial and terminal thirds of the series. The 
findings are interpreted as indicating that Hull’s formulae cannot predict 
responses satisfactorily in a situation involving reversal learning. 
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INTERPRETATION OF DATA AS A FUNCTION OF 
UNITS OF MEASUREMENT! 


N. C. FLETCHER? anp A. H. SHEPHARD 
University of Toronto 


A History of psychology could be written in terms of various views on 
measurement. Perhaps the view most generally accepted is that measure- 
ment is the assignment of numerals to objects or events. While there 
may be some disagreement with this definition, agreement could at least 
be reached about what is done in measuring any attribute. These opera- 
tions include the construction of instruments, the specification of tech- 
niques of application, and the organization of units. Since the mensura- 
tional problem of unit size has received little attention from psycholo- 
gists, it is the purpose of this paper to discuss the implications of various 
unit sizes when interpreting performance measured in terms of achieve- 
ment over time. 

Consider a specific case in which ten male subjects practised con- 
tinuously on the Toronto Complex Coordinator (1) for a period of 20 
minutes on each of two successive days. The subjects were required to 
move a green disc into a lighted red ring by manipulating an airplane- 
type control stick. Each location of the green disc in the red ring was 
called a “match,” the subject’s score being the number of matches made 
in a given period of time. Performance was originally recorded after 
each of sixty 20-second intervals. For purposes of this paper, these 
data were grouped into various interval lengths ranging from 20 seconds 
to 10 minutes. 

As practice was continuous, the data are in no way distorted by divid- 
ing the same 20 minutes of practice into intervals which are multiples of 
the original 20-second period. For convenience these intervals of vary- 
ing lengths will be referred to as “trials.” Thus the original 90-minute 
period can be considered as 20 trials of one minute each, 10 trials of 
two minutes, etc. 


1This work was conducted at the University of Toronto under Research Grant 
No. 265 from the Defence Research Board of Canada and is based on a paper 
presented to the Annual Meeting of the Canadian Psychological Association, Ottawa, 
June, 1956. 


2Now at the University of New Brunswick. 
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In Figure 1, means of number of matches are plotted against trials 
for several different trial lengths. With longer trials, the initial and 
final levels of performance appear higher and the slope of the curve 
steeper. When the data are plotted in 20-second trials, it would appear 
that little change in performance has occurred beyond the first ten 
minutes of practice on the first day. However, for larger trials, the 
performance appears to increase continuously throughout both days of 
practice. Finally the amount and direction of change in performance 
from the first to the second day changes from an apparent decrement 
in the smaller trials to an apparent increment in the larger trials. 


TABLE I 
t-RATIOS FOR TESTs OF A Day-TO-DAY DIFFERENCE FOR VARIOUS INTERVALS 


Mean 


Interval Pre-rest mean _Post-rest mean difference t-ratio 








20 sec. 11.8 10 , : 

40 sec. 24.0 21.7 —2.3 3.15* 
60 sec. 36.3 33.3 —3.0 4.50* 
2 min. 72.2 70.5 -1.7 2.02 
4 min. 143.9 144.5 0.6 0.25 
6 min. 213.9 219.4 5.5 1.91 
8 min. 283 .7 295.3 10.6 3.49* 
10 min. 343.0 372.1 19.1 4.70 






*p = 05, df. = 9, t = 2.262. 


Table I presents means for the last trial of the first day and the first 
trial of the second and t-ratios for the day-to-day differences between 
them for the varying interval sizes. For a difference to be significant 
at the 5 per cent level of confidence with 9 d.f., a t of 2.262 is required. 
The tests of these differences show a significant decrement for interval 
sizes 20, 40 and 60 seconds, no significant difference for interval sizes 
2, 4 and 6 minutes, and a significant increment for interval sizes 8 
and 10 minutes. 

Regardless of the size of the interval of measurement, performance 
as indicated by these data increased from interval to interval on a given 
day in a manner that could be represented by a negatively accelerating 
curve. In order to consider the effect of varying interval sizes on data 
which could be represented by curves other than a negatively acceler- 
ated increasing function, hypothetical curves have been constructed. The 
data have been plotted in trials of interval size A and B. Each B trial 
is equal in length to two A trials. Assume performance is linearly 
related to trials instead of curvilinearly as with the present data. Such 
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Ficvre 2. Hypothetical zero and positive slopes, with trial B 
equal to twice trial A in interval size. 


a curve could have a positive, zero, or negative slope. Figure 2 depicts 
curves with a positive and with a zero slope plotted for both the A and 
B interval sizes. In the positive case (lower graph) the magnitude 
of the numbers representing performance is increased as expected. The 
slope of the curve is also increased. The numbers representing perform- 
ance for the zero-slope curve increase also when trial length is increased 
from A to B, but the amount of increase is the same for all trials, hence 
the slope of the curve is unaltered, 

Hypothetical positively and negatively increasing curves are shown in 
Figure 3 for A and B sizes of trials. With a trial twice the length (B), 
the increase in initia] level is much greater and the slope is consider- 
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Ficure 8. Hypothetical negatively and positively accelerated curves, 
with trial B equal to twice trial A in interval size. 


ably increased for the negative over the positive acceleration. Such 
results could be anticipated in any case where the largest trial-to-trial 
increases in performance occur early in practice. Both the above effects 
would become more marked as the slope of the curve increased. 

The two 20-minute periods noted in the sample data were separated 
by a 24-hour rest. It is apparent now that alteration in the day-to-day 
change in performance, as interval size was increased, is attributable 
to the difference in slope of the performance curve prior to and after 
the rest period. As illustrated by the corresponding hypothetical curve, 
the largest increase in the magnitude of the numbers representing per- 
formance occurs over that part of practice where increases in perform- 
ances are largest. At the end of the first day, the slope of the curve is 
much less steep than at the beginning of the second. As interval size 
was increased, the score for an interval increased much more rapidly at 
the beginning of the second day than at the end of the first. Hence for 
these data the nature of the post-rest change in performance is partly a 
function of the size of measuring unit. In addition, the importance of the 
size of the measuring interval in appraising any day-to-day change in 
performance is partly dependent on the general shape of the performance 
curves on each of the two days. 
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In terms of the above, size of measuring interval is important in the 
experimental design of studies in which performance is observed in terms 
of achievement in a given time interval. An attempt may be made to 
maximize the precision of an experiment by observing larger samples 
or by controlling or randomizing variables. Further attempts at pre- 
cision are sometimes made by increasing the sensitivity of measurement 
through the use of smaller intervals, on the assumption that whatever is 
indicated by the larger interval will be better indicated in the smaller 
interval. However, since data are characteristically influenced by the 
size of measurement interval, quite different phenomena may be observed 
with differing intervals. For example, in our illustrative data the day-to- 
day difference changed from a significant decrement to a significant in- 
crement with increased interval length. Hence the mensurational pro- 
cedures employed in a study may importantly affect the outcome of the 
study. 

Since concepts may be defined in terms of certain immediately observ- 
able phenomena, and since phenomena may be closely related to the 
particular size of measurement interval employed, the concepts may be 
limited in their applicability to a specific study. Differences between the 
results of studies on the same problem may well be attributable to 
differences in measurement intervals. In the definition of such concepts, 
unit size is as much an integral part of the definition as the operation 
techniques set forth in the measure. 

Although this discussion has been conducted largely in terms of a 
learning measure, the implications extend to any mensurational system 
which constitutes an assessment of performance over time. If successive 
determinations of the measure result in a performance change, whether 
it be due to learning or to any other variable, then the measure could be 
similarly affected by interval size. 
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THE MEASUREMENT OF ANXIETY: 
A METHODOLOGICAL NOTE 


LEON J. KAMIN 
Queen's University 


THE DEVELOPMENT of a reliable instrument for the measurement of indi- 
vidual differences in anxiety is a matter of concern to many psychologists. 
The volume of research generated by Taylor's Manifest Anxiety Scale 
(4) bears witness to the need. While the concept of anxiety has become 
central to clinical psychological theory, valid and reliable independent 
measures of anxiety have not been adequately developed. The clinical 
research worker must in consequence often rely on the most primitive 
rating-scale techniques. This note suggests a programme and a point of 
view designed to facilitate development of more suitable tests of anxiety. 

The research worker in this field commonly begins with some test 
which he intuitively believes may measure anxiety. Then he selects a 
task which, on the basis of theory, he believes may be affected by 
anxiety. The next step is to administer both test and task to a 
number of experimental subjects, to discover whether anxiety score will 
be related to task performance. The answer to this question will be yes 
or no. Whatever the empirical answer may be, its meaning must be 
ambiguous. Presume that anxiety score is related to task performance. 
Then it is always possible that anxiety score is confounded with some 
other variable, intelligence for example, and that the observed relation 
between test and task is attributable to the confounded variable. Pre- 
sume now that no relation is found between anxiety score and task 
performance. This would not necessarily mean that the test of anxiety 
is invalid, or that the task is not affected by anxiety. It is possible that 
performance on the task is so largely determined by factors other than 
anxiety that no anxiety-task relation can be demonstrated (3). 

The fact that no task exists on which performance is exclusively a 
function of individual differences in anxiety is important. It implies 
that the development of tests of anxiety is linked logically to a categoriza- 
tion of tasks in terms of their susceptibility to anxiety effects. We turn 
first to this problem of categorizing tasks. What guarantee have we that 
a given task should be sensitive to individual differences in anxiety? 
The common denominator of most conceptions of anxiety is that it is 
elicited in situations of danger or threat (2). Thus, we want to be cer- 
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tain that the task in some way presents a danger to the subject. There is 
no great difficulty here: physiology and culture being what they are, both 
college sophomores and army recruits can be effectively threatened with 
the danger of an electric shock. The experimenter seeking to manipulate 
anxiety has most commonly used electric shock toward that end. 

The fact is that any task can be made dangerous (anxiety-eliciting ) 
by the simple expedient of threatening the subject with the possibility 
of shock while performing the task. We can transform so simple a situa- 
tion as the reaction-time experiment to a situation of danger by inform- 
ing the subject that he will be shocked if he reacts too slowly. We 
cannot, however, necessarily expect a relation between a valid anxiety 
test and performance on the now dangerous task. Such an expectation 
would assume, falsely, that the task is not sensitive to any other indivi- 
dual difference variable. 

We must note that tasks will vary in the degree to which they are 
affected by the introduction of shock-threat. To discover whether a 
given task is a good choice is a simple empirical matter; all that must 
be done is to administer the task to the same subjects twice, once with 
and once without the threat of shock.! To the degree that the cor- 
relation of performance under threat and non-threat conditions is high, 
the task is not a good choice. The correlation coefficient thus obtained 
reflects the influence of factors other than shock-threat on performance. 

The writer has in fact reported such a correlation for simple reaction 
time (1), based on only 67 subjects. The correlation was .68. This 
correlation is certainly not small, but it is smaller than the reliability 
coefficient of simple reaction time. This fact means that the introduc- 
tion of threat brings to bear on task performance new individual differ- 
ence variables. The position of this paper is that it is exactly these 
variables with which a valid test of anxiety must concern itself. That 
is, the test must predict the magnitude of the effect on S of a change 
from non-threat (base-line) to threat conditions. 

We return now to the test of anxiety. When we have found a task 
which, by the technique outlined above, proves very sensitive to anxiety, 
our most reasonable demand on the anxiety test is that it correlate with 
a difference or ratio score of task performance assigned to each S. 
The difference or ratio score will contrast S’s performance under shock- 
threat with his base-line performance—and it is this difference or ratio, 
not the absolute level of performance under threat, which most directly 
reflects the effect of anxiety. 

Thus, we now administer the anxiety test to each S, and each § 


Provision must be made in such a study to cancel out possible order effects. 
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performs the task under threat and non-threat conditions. The ques- 
tion now is whether anxiety score is related to the change in S’s per- 
formance as he moves from one condition to the other. The answer, 
again, is yes or no. Presuming that the answer is yes, there is little 
danger that any confounding variable taints the observed relation. There 
is, for example, no rational likelihood that intelligence is related to a 
difference or ratio score, though it may well be related to S’s absolute 
performance under each condition. Presuming that the answer is no, 
the anxiety test cannot then be assumed to be valid. 

We now make the happy assumption that a valid test of anxiety is 
discovered by this means. There arise at once a number of compli- 
cating questions, but they are all straightforward and empirical. First, 
is a test of anxiety which has been validated against Task A also valid 
when the criterion is the threat-induced change in performance on Task 
B? This question in effect asks whether there are different kinds of 
anxiety, specific to different kinds of tasks. Note that, if a test were 
consistently validated against only one task, it would by our criteria be 
a valid test of one “kind” of anxiety. Second, is a test of anxiety which 
has been validated against Task A when the threat of shock is employed 
also valid, again using Task A, when the threat, for example, of a blow 
to S’s prestige is employed? This question asks whether there are 
different kinds of anxiety specific to different kinds of threat. When we 
begin to ask such questions seriously, we manceuvre ourselves into the 
position of undertaking large-scale factorial studies. Thus, a large body 
of Ss would be given a large number of tasks, under non-threat con- 
ditions and under conditions of several kinds of threat. Presumably, 
factors would emerge from such a study which would reflect several 
different “anxieties.” 

The factors which do or do not emerge from such a study will be of 
inherent interest, aside from the development of tests. We do not yet 
know the answer even to so simple a question as this: if the same Ss 
were given the same task three times, once without threat, once with 
shock-threat, and once with “ego” threat, would the difference scores 
obtained by contrasting non-threat and shock-threat performance cor- 
relate with those obtained by contrasting non-threat and ego-threat per- 
formance? Problems of this nature—simple empirical problems—are 
fundamental to our conception of anxiety. 

The research outlined in this note is in no way dependent on clinical 
definitions of anxiety, or on theories of the role anxiety plays in neurosis 
and psychosis. We can wonder, however, whether tests of anxiety 
developed as we have suggested would differentiate clinically “anxious” 
patients from normals. The logic of difference or ratio scores has a 
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special relevance here in controlling the confounding variables other than 
anxiety which are undoubtedly associated with psychopathology. We 
may speculate also on whether clinically anxious people would be sen- 
sitive to one kind of threat rather than another, and whether differ- 
ences in this regard may exist among diagnostic categories. If a thorough 
study along the lines indicated showed that experimentally valid tests 
of anxiety were not related to psychiatric diagnoses of anxiety, we 
would at least know that the kinds of threat manipulable by experi- 
menters and the “anxiety” at the heart of clinical theory have no mutual 
relevance. 
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PsyCHOLOGisTs occasionally play the popular game of choosing the one 
book they would take with them for an extended and isolated stay on 
a desert island. When limited to selecting a psychological volume, 
symptoms of conflict are readily apparent. There is no necessity to use 
an anxiety check list to realize that restriction has imposed indecision, 
while unrestricted choices range easily from the Bible through Dante to 
the dictionary. In general, the psychological volumes chosen seem to 
fall into two categories: those of many pages and references representing 
an encyclopaedic safety from boredom; and those of theory, providing 
the stimulation of inconclusiveness. The appearance of F. H. Allport’s 
volume, Theories of Perception and the Concept of Structure (1), has 
afforded an amazing desert island choice, combining both categories. 

In 700 tightly packed pages the author has attempted three major 
tasks: the organized description of 13 theories of perception, the ab- 
straction of generalizations from them, and the introductory statement 
of a concept of structure. The success of the first two tasks has been 
amply documented in the reviews to date. The success of the third, 
neglected in most comments, is fundamentally dependent upon the 
extent to which Allport’s concept of structure stimulates original 
experimentation on, and controlled observation of, human behaviour. It 
is not the purpose of this discussion to review the text or to assess the 
validity of the concept by citing aspects of behaviour which do not 
appear to be encompassed by it. Allport states that he is using the per- 
ceptual process as a testing ground for his concept (it becomes a theory 
in the final chapters), and he is certainly aware of the ramifications of 
its use in the systematic explanation of other phenomena. The present 
purpose is to suggest several directions of investigation in traditional 
psychological areas that are stimulated by Allport’s approach. 

His concept of structure is very difficult to summarize. Allport still 
writes, after a lapse of some years, as a social behaviourist. He performs 
something of a tour de force in constructing a structured model of per- 
ception without a bibliographic reference to Snygg, Rogers or Lecky. 
His anxious appreciation of Hebb’s model and of the cyberneticists’ 
contribution contrasts strangely with his emphasis on what used to be 
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called “mental set.” In brief, Allport attempts to bridge the problem of 
mechanism and vitalism with the concept of a flexible functioning struc- 
ture. In the simplest terms (and his concluding chapters are anything 
but simple) the author suggests a model based upon ongoing processes, 
such as receptor activities and neural impulses, and events. The latter 
are the “occasions” or junction points between one ongoing process and 
the next, thereby forming the psychological sequence. 

For example, in the sequence “cold water touches tooth and hurts” 
we would have: 


Water approaches -» Tooth -> Water touches -> Afferent +» Synape -rete. 


tooth ‘ tooth nerve . 
Y Y Y 
; ; ongoing : . 
ongoing stimulus receptor event ongoing even 
process quanen process 


This apparently simple approach to a functioning structure is com- 
plicated if one carries the example to the resultant behaviour of the 
tongue touching the tooth to warm it and to remove the pain event or sen- 
sation, thus restoring the ongoing receptor process to its initial state. 
Implied in the description of the sequence are two major considerations; 
the alteration of ongoing processes, rather than the triggering into 
motion of a body initially at rest (as in the classical S-R formula); and 
the organization of a complexity or network of ongoing processes into a 
series of event-points, presumably in the cortex. The series of event- 
points effects the alteration of the ongoing processes of behaviour which 
we can objectively or subjectively observe, and which we call the 
response. In this fashion Allport attempts the solution of the mechanism- 
vitalism conundrum by postulating a functioning structure, responding 
to the ongoing processes traditionally named the stimuli by alteration 
in function but not necessarily in structure. He is proposing, in a sense, 
a quantum physics for psychology. 

The above is by no means a comprehensive outline of Allport’s con- 
cept of structure. It does suggest, however, that hidden in the final 
two chapters of the text are implications which the author has not as 
yet pursued in other than technical terms. Without attempting to trace 
through the complexities of the three-dimensional model necessitated 
by this concept of structure, the general stimulus value of the concept 
is immediately apparent when applied to representative psychological 
phenomena. 

For example, the baffling and basic problem of memory may be 
opened to new investigation under Allport’s impetus. Hartley's 
vibratiuncles, Locke’s tabulae, Watson’s conditioned responses, the 
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neurophysiologists’ closed circuits and feedbacks, and even Freud's un- 
conscious have had one thing in common: the concept of a dormant 
organism stirred into action by a stimulus. Allport is essentially suggesting 
that memory is a process of altering ongoing receptor processes through 
a series of stimulus events. Such alterations should logically remain 
until displaced by other stimulus events. The phenomena we call 
memory and forgetting could conceivably be approached from the 
viewpoint of displacement of ongoing processes. The laws of recency 
and primacy as well as the effects of traumatic experience might achieve 
new relevance. 

In the companion field of learning, utilization of past experience in 
the solution of a present problem is a primary concept. This is a 
technical way of saying that a person remembers and applies. He does 
this on a basis of subjective rather than mathematical probability— 
what worked the last few times should work again. The concept of a 
recently altered ongoing process continuing to affect behaviour until dis- 
placed by new and striking stimulus events is a useful one. It is also 
possible that the preparedness of mental set implicit in an ongoing 
process could form an explanatory bridge between learning and per- 
sonality phenomena. 

These notions have striking implications for the clinical area. Apart 
from the obvious contemplation in the above terms of the effect on psy- 
chosis of the heroic therapies, electro-shock, drug, and surgical, a con- 
sideration of the psychotic behaviour as non-displaceable behaviour is of 
interest. It would be possible to consider abnormal behaviour in terms of 
rigidity of ongoing processes, resistance to stimulus event, or physiological 
disruption of the event sequence. In this connection, one of the interesting 
features of Allport’s approach is that it leaves the time element, the 
speed with which an individual alters, to the realm of individual differ- 
ences. Speed becomes a function of the complex inter-relations of 
stimulus events, ongoing processes and responsive events, and con- 
ceivably varies from one individual to the next while resulting in the same 
overt change in behaviour. A consideration of changes in psychotic 
behaviour, freed from the traditional fixed-time dimension of disease 
change, would allow us to equate patients on a basis of change in 
syndrome, regardless of the time taken to change, rather than on a 
cross-section of symptoms at a given point in time. In other words, a 
patient may progress through a series of symptoms in one week that 
are similar in sequence to those of other patients whose progression 
takes from one to six months. A logic of illness-change based on a 
flexible time base might well emerge. 

One final illustration of Allport’s stimulus value (the criterion on which 
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he is being judged in this discussion): the concept of self. At first 
thought, the concept of structure proposed would seem to be antagon- 
istic to a phenomenal concept of self (as would Allport in his total lack 
of reference to it). However, juxtaposition of the two concepts in our 
thinking produces some interesting emphases. Historically, writers on 
the self have at various times stressed object-agent confusion and 
clarification, self-induced sensations, reflection of the behaviour of others, 
and judgment-making. Theorists in this area have inevitably run into the 
cereal-box problem of the man holding the picture of the man holding 
the picture of the man. ... The problem, like that found in the 
systematic description of objective observations, is one of the agent 
viewing the agent as object. Allport may well have provided us with a 
spherical model in which the circular aspects of self-awareness can be 
meaningfully described in terms of ongoing processes, Stimulus events 
(neurophysiological initially, external-social and internal-cognitive later 
on) could be considered as modifying the function of the processes, 
subtly at times and more dramatically at others, determining at any 
given time our perception of self. Self-awareness might then be 
described in genetic terms, incorporating physiological and social aspects 
in a perceptual framework. 

This discussion has attempted to illustrate the stimulus value of All- 
port’s concept of structure. The concept’s uniqueness depends upon its 
stress on the alteration of function of ongoing processes, rather than 
upon the stimulation of responses. It provides a flexible model, allow- 
ing the inter-relation of social and cognitive data, it has deep systematic 
implications, it should start some argument and is, in short, a good desert 
island selection. 
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